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Abstract 

Terms of Church’s A-calculus can be considered equivalent along many different definitions, 
but context equivalence is certainly the most direct and universally accepted one. If the un¬ 
derlying calculus becomes probabilistic, however, equivalence is too discriminating: terms 
which have totally unrelated behaviours are treated the same as terms which behave very 
similarly. We study the problem of evaluating the distance between affine A-terms. The most 
natural definition for it, namely a natural generalisation of context equivalence, is shown to 
be characterised by a notion of trace distance, and to be bounded from above by a coinduc- 
tively defined distance based on the Kantorovich metric on distributions. A different, again 
fully-abstract, tuple-based notion of trace distance is shown to be able to handle nontrivial 
examples. 


1 Introduction 

Probabilistic models are formidable tools when abstracting the behaviour of complicated, in¬ 
tractable systems by simpler ones, at the price of introducing uncertainty. But there is more: 
randomness can be seen as a way to compute] in modern cryptography, as an example, having 
access to a source of uniform randomness is essential to achieve security in an asymmetric set¬ 
ting [14]. Other domains where probabilistic models play a key role include machine learning [24] , 
robotics [27], and linguistics [21] . 

Probabilistic models of computation have been studied not only directly, but also through 
concrete or abstract programming languages, which are most often extensions of their deterministic 
siblings. Among the many ways probabilistic choice can be captured in programming, the simplest 
one consists in endowing the language of programs with an operator modelling the flipping of a 
fair coin. This renders program evaluation a probabilistic process, and under mild assumptions 
the language becomes universal for probabilistic computation. Particularly fruitful in this sense 
has been the line of work on the functional paradigm, both at a theoretical [niiiiiis] and at a 
more practical level [la¬ 
in presence of higher-order functions, program equivalence can be captured by so-called con¬ 
text equivalence: two programs M and N are considered equivalent if they behave the same no 
matter how the environment interacts with them: for every context C, it holds that Obs(C[M]) = 
Obs(C[A^]). However, this definition has the drawback of being based on an universal quantification 
over all contexts: showing that two programs are equivalent, requires considering their interaction 
with every possible context. The problem of giving handier characterisations of context equivalence 
can be approached in many different ways. As an example, coinductive methodologies for pro¬ 
gram equivalence have been studied thoroughly in deterministic [DUi] and non-deterministic [19] 
computation, with new and exciting results appearing recently also for probabilistic languages: 
applicative bisimilarity, a coinductively defined notion of equivalence for functional programs, has 
been shown to be sound, and sometime even fully abstract, for probabilistic A-calculi 011]. 
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In a probabilistic setting, however, equivalences are too strong if defined as above. Indeed, two 
programs are equivalent if their probabilistic behaviour is exactly the same (in every context). The 
actual value of probabilities in a probabilistic model often comes from statistical measurements, 
and should be considered more as an approximation to the actual probability law. Consequently, 
we would like to compare programs by appropriately reflecting small variations in them. Another 
scenario in which a richer, more informative way of comparing programs is needed is cryptography, 
where a central notion of equivalence, called computational indistinguishability |13] is indeed based 
on statistical distance rather than equality: the adversary can win the game, but with a small 
probability. Summing up, equivalences should be refined into metrics, and this is the path we will 
follow in this paper. 

In probabilistic A-calculi, the notion of observation Obs(-) is quantitative: it is either the 
probability of convergence to a certain observable base value (e.g. the empty string), or the 
probability of convergence tout court. One can then easily define a notion of context distance as 
the maximal distance contexts can achieve when separating two terms: 

= sup |Obs(C[M]) - Obs(C[Af])|. 

e 

This looks very close to computational indistinguishability, except for the absence of a security 
parameter: a scheme is secure if the advantage of any adversary in a given game (e.g., consisting 
in distinguishing between the case where the scheme is used, and the case where it is replaced by 
a truly random process) is “small” (e.g., negligible). Again, however, we find ourselves in front of 
a definition which risks to be useless in proofs, given that all contexts must be taken into account. 
But how difficult is evaluating the distance between concrete higher-order terms? Are there ways 
to alleviate the burden of dealing with all contexts, like for equivalences? These are the questions 
we address in this paper, and which have to the authors’ knowledge not been investigated before. 

As we will discuss in Section [2] below, finding handier characterisations of the context distance 
poses challenges which are simply different (and often harder) than the ones encountered in context 
equivalence. In particular, the context distance tends to trivialise and, perhaps worse, naively 
applying the natural generalisation of techniques known for equivalence is bound to lead to unsound 
methodologies. Indeed, one immediately realises that the number of times contexts access their 
argument is a crucial parameter, which must necessarily be dealt with. This is the reason why we 
work with an affine A-calculus in this paper: this is a necessary first step, but also points to the 
right way to tame the general, non-linear case. 

An extended version of this paper with more details is available . 

Contributions 

We introduce in this paper three distinct notions of distance for terms in an untyped, probabilistic, 
and affine A-calculus. The first one is a notion of trace distance, in which terms are faced with linear 
tests, i.e. sequences of arguments. The distance between two terms is then defined as the greatest 
separation any linear test achieves. The first results of this paper are the non-expansiveness of the 
trace distance, which implies (given that any linear test can easily be implemented by an affine 
context) that the trace and context distances coincide. This is the topic of Section 2] below. 

Section [SJ instead, focuses on another notion of distance, which is coinductively defined fol¬ 
lowing the well-known Kantorovich metric m for distributions of states in any labelled Markov 
chain (LMC in the following), and that we dub the bisimulation distance. This second notion of 
a distance is not only smaller than the trace distance, which is well expected, but non-expansive 
itself. This is proved by a variation on the Howe’s method m, a well-known technique for proving 
that bisimilarity is a congruence in an higher-order setting, and which has never been used for 
metrics before. On the other hand, the bisimulation distance does not coincide with the context 
distance, a fact that we do not only prove by giving a counterexample, but that we justify by 
relying on a test-based characterisation of the bisimulation distance known from the literature. 

For the sake of simplicity, the trace and bisimulation distances are analysed on a purely applica¬ 
tive A-calculus, keeping in mind that pairs could be very easily handled, and can even be encoded 
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in the applicative fragment, as discussed in Section 14.41 The presence of pairs, however, allows 
us to form very interesting examples of distance problems, one of which will drive us throughout 
the paper but unfortunately turns out hard to handle neither by the trace distance nor by the 
bisimulation distance. This is the starting point for the third notion of distance introduced in this 
paper, which is the subject of Section |6l and which we call the tuple distance. Our third notion 
of distance can be proved to coincide with the trace distance, and thus with the context distance. 
But this is not the end of the story: in the tuple distance, not a single but many terms are com¬ 
pared, and this makes the distance between concrete terms much easier to evaluate: interaction is 
somehow internalised. In particular, our running example can be handled quite easily. The way 
the tuple distance is defined makes it adaptable to non-affine calculi, a topic which is outside the 
scope of this paper, but which we briefly discuss in Section [67^ 

Related Work 

This is definitely not the first work on metrics for probabilistic systems: notions of coinductively 
defined metrics for LMCs, as an example, have been extensively studied (e.g. [iniinillH])- There 
has been, to our knowledge, not so many investigations on the meaning of metrics for concrete 
programming languages and almost nothing on metric for higher-order languages. 

If the key property notions of equivalences are required to satisfy consists in being congruences, 
the corresponding property for metrics has traditionally been taken as non-expansiveness. Indeed, 
many results from the literature (e.g. nnum) have precisely the form of non-expansiveness results 
for metrics defined in various forms. The underlying language, however, invariably take the form 
of a process algebra without any higher-order feature. The work of Gebler, Tini, and co-authors 
shows that one could go beyond non-expansiveness and towards uniform continuity [12] but, again, 
higher-order functions remain out of scope. 

Notions of equivalence for various forms of probabilistic A-calculi have also been extensively 
studied, starting from the pioneering work by Plotkin and Jones HZ], down to recent results 
on probabilistic applicative bisimulation 0 0], logical relations [3], and probabilistic coherent 
spaces izKn]. None of the works above, however, go beyond equivalences and deals with notions 
of distances between terms. 


2 The Anatomy of a Distance 

In this section, we describe the difficulties one encounters when trying to characterise the context 
distance with either bisimulation or trace metrics. 

Suppose we have two terms M and N of boolean type written in a probabilistic A-calculus. As 
such, M and N do not evaluate to a value in the domain of booleans but to a distribution over 
the same domain. M evaluates to the distribution assigning true probability 1, while N evaluates 
to the uniform distribution over booleans, (i.e. the distribution which attributes probability ^ to 
both true and false). Figure [T] depicts the relevant fragment of a LMC, whose induced notion of 
probabilistic bisimilarity has been proved to be sound for context equivalence 0]. M and N are 
not bisimilar. Indeed, true and false are trivially not bisimilar, while M and N go to equivalent 
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states with different probabilities. The two terms are non-equivalent also contextually. But what 
should he the distance between M and N1 

For the moment, let us forget about the context distance, and concentrate on the notions of 
distance for LMCs we mentioned in Section [T] In all cases we are aware of, we obtain that M 
and N are at distance As an example, if we consider a trace metric, we have to compare the 
success probability of linear tests, starting from M and N. More precisely, the tests of interest 
with respect to these two terms are: 

t := eval; s := eval • true; r := eval • false. 

Since neither M nor N has a non-zero divergence probability, they both pass the test t with 
probability 1. The success probability of the test s corresponds to the probability of evaluating 
to true: it is 1 for M and i for N. Similarly, the success probability of r corresponds to the 
probability to obtain false after evaluation: it is 0 for M and \ for N. So we can see that the 
maximal separation linear tests can obtain is The situation is quite similar for bisimulation 
metrics uni, which attribute distance ^ to M and N. 

It is easy, however, to find a family of contexts such that C„[M] evaluates to true with 

probability 1, and C„[iV] evaluates to false with probability 1 — define C„ as a context that 
copies its argument n times, returning false if at least one of the n copies evaluates to false, and 
otherwise returns true. As a consequence, the context distance between M and N is 1. In fact, this 
reasoning can be extended to any pair of programs which are not equivalent but whose probability 
of convergence is 1: out of a context which separates them of e > 0, with e very small, we can 
construct a context that separates them of 1 performing some statistical reasoning. The situation 
is more complicated if we take the probability of convergence as an observable: we cannot always 
construct contexts that discriminate terms based on their probability of convergence, although 
something can be done if the terms’ probabilities of convergence are different but close to 1. The 
context metric, in other words, risks to be not continuous and close to trivial if contexts are 
too powerful. What the example above shows, however, is something even worse: if contexts 
are allowed to copy their arguments, then any metric defined upon the usual presentation of 
probabilistic A-calculus as an LMC (a fragment of which is depicted in Figure [1]) is bound to be 
unsound w.r.t. the context metric. 

Whether bisimulation metrics are sound, how close they are to the context distance, and 
whether they are useful in relieving the burden of evaluating it, are however open and interesting 
questions even in absence of the copying capability, i.e., when the underlying language is affine. 
This is the main reason why we focus in this work on such a A-calculus, whose expressive power is 
limited (although definitely non-trivial |20j l but which is anyway higher-order. We discover this 
way an elegant and deep theory in which trace and bisimulation metrics are indeed sound. At the 
end of this paper, some hints will be given about how the case of the untyped A-calculus can be 
handled, a problem which we leave for future work. 

Evaluating the context distance between affine terms is already an interesting and nontrivial 
problem. Consider, as an example, a sequence of terms {M„}„gN defined inductively as follows 
(where fl stands for a term with zero probability of converging): 

Mq = (Ax.fl, Ax.n); Mn+i = {Xx.Mn, Xx.fl). 

Mq is the pair whose components are both equal to Xx.Xl, and Mn+i is defined as a pair whose first 
component is the function which returns Mn whatever its argument is, and the second component 
is again Aa;.n. We are now going to define another sequence of terms {A^njneN) which can be seen 
as a noisy variation on More precisely, Nq is the same as Mq, and for each n S N, Nn+i 

is constructed similarly to but adding some negligible noise in both components: 

No = (Ax.n, Xx.n); 

Nn +1 = {Xx.Nn fl, Xx.n ©^ I). 

(/ stands for the identity: Xx.x, while the term L K has the same behaviour as L with 
probability (1 — p), and the same behaviour as K with probability p.) We would like to study 
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how the distance between M„ and Nn evolves when n tends to infinity: do the little differences 
we apply at each step n accumulate, and how can we express this accumulation quantitatively? 

Intuitively, it is easy for the environment to separate Mn and Nn of ^: it is enough to consider 
a context C which simply takes the second component of the pair, passes any argument to it, and 
evaluates it: the convergence probability of C[M„] is 0, while the convergence probability of C[Nn] 
is But the environment can also decide to take the first component of the pair, in order to use 
the fact that Mn-i and Nn-i can be distinguished: more precisely, let us suppose that we have a 
context C which separates Mn-i and Nn-i- Then we can construct a context T) which takes the 
first element of the pair, passes any argument to it, tries to evaluate it, and if it succeeds, gives the 
result as an argument to C. We would like to express the supremum of the separation that such a 
context can obtain as a function of the distance between Mn-i and Nn-i- Unfortunately, this is 
not so simple: if C is such that the convergence probability of C\Mn-i] is e and the convergence 
probability of C[Nn-i] is t, we can see that the convergence probability of is e, whereas the 

convergence probability of VlNn] is (t • (1 — ^)). But it is not possible to express |e — (. • (1 — ^)| 
as a function of je — i\ and n: intuitively, the separation that the context T> can achieve depends 
not only on the separation that the context C can achieve, but also on how C achieves it. And 
moreover, the environment may of course decide to use the two components of the pair, and to 
make them interact in an arbitrary way. Summing up, although the mechanism of construction 
of these terms seems to be locally easy to measure, it is complicated to have any idea about how 
the distance between them evolves when n tends to infinity. 


3 Preliminaries 

In this section, an affine and probabilistic A-calculus, which is the object of study of this paper, 
will be introduced formally, together with a notion of context distance for it. 


3.1 An AfRne, Untyped, Probabilistic A-Calculns 

We endow the A-calculus with a probabilistic operator ©, which corresponds to the possibility for 
the program to choose one between two arguments, each with the same probability. Terms are 
expressions generated by the following grammar: 

M:-=x I Xx.M I MM | M©M | U, 

where U models divergenc^ilj and x ranges over a countable set V of variables. 

The class of affine terms, which model functions using their arguments at most once, can be 
isolated by way of a formal system, whose judgements are in the form T h M (where T is any 
finite set of variables) and whose rules are the following (where T, A stands for the union of two 
disjoints contexts): 

__ T,x^M T\-M AhiV 

ThAx.M r,AhMA 


r h M 


r h A 


r h u 


r h M© A 

A program is a term such that 0 h M, and P is the set of all such terms. We will call them 
closed terms. We say that a program is a value if it is of the form Xx.M, and V is the set of such 
programs. The semantics of the just defined calculus is expressed as a binary relation JJ. between 
programs and value subdistributions (or simply value distributions), i.e. functions from values to 
real numbers whose sum is smaller or equal to 1. The relation JJ. is inductively defined by the 
following rules: 

V a value M i). ^ A Jj. ^ 


U JJ 0 


Vi).{V^} 


M© A JJ 


^since we only consider afRne terms, we cannot encode divergence by the usual constructions of A-calculus 
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{L{V/x} ^ 

,v}\x.L&s(Si)y&s{g) 

MN X; ^(Ax.L) • ^{V) ■ ^L.y 

where S(^) stands for the support of the distribution The divergent program fl, as expected, 
evaluates to the empty value distribution 0 which assigns 0 to any value. The expression 
stands for the Dirac’s value distribution on V\ more generally the expression {Vf^,... ,VP"} 
indicates the value distribution assigning probability pi to each Vi (and 0 to any other value). 

For every program M, there exists precisely one value distribution ^ such that M that we 
note [[M]. This holds only because we restrict ourselves to affine terms. Moreover, |M| is always 
a finite distribution. The rule for application expresses the fact that the semantics is call-by-value: 
the argument is evaluated before being passed to the function. There is no special reason why we 
adopt call-by-value here, and all we are going to say also holds for (weak) call-by-name evaluation. 

In some circumstances, we would need to have a more local view of how the programs behave. 
For these reason, we define an equivalent notion of small-steps semantics, which allows us to reason 
about every small execution step. We define first a one-step semantics —>■ between programs and 
distribution over programs: 

^ 0 M®N + 


{Xx.M)V ^ {M{V/x}^} 

_ M ^ 9 _ _ N ^ ^ _ 

MN VN 

Then we use it to define a small step semantics =J>, which is a relation between programs and value 
distributions, and corresponds to do as much as possible steps of ^■. The rules are the following: 

-^— M ^ & i^N <^iv)jves(®) 

^{n) ■ <sn 

Big-step and small-steps semantics are equivalent: for every program M, there exists a unique 
distribution S) such that M and moreover ^ = |M|. 

3.2 Context Distance 

We now want to define a notion of observation for programs which somehow measures the conver¬ 
gence probability of a program. We will do that following the previous literature on this subject. 
For any distribution ^ over a set A, its sum YaeA ^(o) is indicated as ^ ^ and is said to be the 
weight of The convergenee probability of a term M, that we note is simply Y [.^1) 

i.e., the weight of its semantics. For instance, the convergence probability of D is zero. 

The environment, as usual, is modelled by the notion of a context, which is nothing more than 
a term with a single occurrence of the hole [■]. They are generated by the following grammar: 

C ::=[■] I M I Xx.C \ CM \ MC \ C®C. 

Affine contexts can be identified by a formal system akin to the one for terms. We note as C[M] 
the program obtained by replacing [•] by the closed term M in C. The interaction of a program 
M with a context C is the execution of the program C[M]. 

We now consider three different ways of comparing programs, based on their behaviour when 
interacting with the environment: a preorder ^ an equivalence relation ^ and a map 

jctx: 

Definition 1 (Context Equivalence, Context Distance) Let M and N be two programs. Then 
we write that M N if and only if for every context C, it holds that 'P'’"(C[M]) < V'^'"{C[N]). 

If M N and N M, then we say that the two terms are context equivalent, and we write 
M N. With the same hypotheses, we say the context distance between M and N is the real 
number 5‘^*^{M,N) defined as supc|7^'’"(C[M]) —'P'’"(C[fV])|. 
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Please observe that, following [5], we only compare programs and not arbitrary terms. This is 
anyway harmless in an affine setting. 

Example 1 Let I be the identity Xx.x. I and Ll are as far as two programs can be: = 1. 

To prove that, finding a context which always converges for one of the terms, and always diverges 
for the other one, suffices. We can take C = [•], and we have that = 1 and V‘^'"{C[Vt\) = 0. 

Of course, I and n are not context equivalent. Throwing in probabilistic choice can complicate 
matters a bit. Consider the two terms / © fl and I. One can easily prove that (B Ll, I) > ^: 

just consider C = [■]. However, showing that the above inequality is in fact an equality, requires 
showing that there cannot exist any context that separates more, which is possible, but definitely 
harder. This will be shown in the next section, using a trace-based characterisation of context 
distance. 

3.3 On Pseudometrics 

Which properties does the context distance satisfy, and which structure it then gives to the set of 
programs? This section answers these questions, and prepares the ground for the sequel by fixing 
some terminology. 

Definition 2 (Pseudometrics) Let S be a set. A premetric on S is any function fi : S ^ S 
such that 0 < ffisfi) < 1 and p{s,s) = 0. d pseudometric on S is any premetric such that 
for every s,t,u € S, it holds that ffisfi) = p.{t,s) and ffis,t) < p{s,u) + ffiufi). The set of all 
pseudometrics on S is indicated with A(S). 

Please observe that pseudometrics are not metrics in the usual sense, since ffis,t) — 0 does not 
necessarily imply that s = t. If we have a pseudometric /i, we can construct an equivalence 
relation by considering the kernel of /i, that is the set of those pairs (s, t) such that p{s, t) = 0. It 
is easy to prove that the context distance is indeed a pseudometric, and that its kernel is context 
equivalence. We would now want to define a preorder on pseudometrics in such a way that 

if /i then the kernel of p is included in the kernel of p. The natural choice, then, is to 

take the following definition, which is the reverse of the pointwise order on [0,1]: 

Definition 3 (Pseudometric Ordering) Let S be any set, and let p and p be two metrics 
in A(S'). Then we stipulate that p p if and only if, for every s,t € S we have that 

p[s,t) < p{s,t). 

Lemma 1 For any set S, (A(S'), • .) is a complete lattice. 

But when, precisely, can a pseudometric on programs be considered as a sound notion of distance? 
First of all, we would like it to put two programs at least as far as the difference between their 
convergence probabilities, since this is precisely our notion of observation: 

Definition 4 (Adequacy) Let p be a pseudometric on the set of programs. Then p is an ade¬ 
quate pseudometric if for any programs M and N, we have that |M] — ^ |7V]|| < p{M,N). 

Secondly, we are interested in how programs behave when interacting with the environment. 
Especially, if we have two terms M and iV at a given distance e, and we put them in an environment 
C, we would like a pseudometric p to give us some information about the distance between C[M] 
and C[A]. This is the idea behind the following, standard, definition: 

Definition 5 (Non-Expansiveness) Let p be a pseudometric on programs. We say p is non- 
expansive if for every pair of programs M and N and for every contextC, we have that p{C[M],C[N]) < 
p{M,N). 

Non-expansiveness is the natural generalisation of the usual notion of congruence: if R is an 
equivalence relation on program, it is congruent if for every context C, ii M RN, then C[M] RC[N]. 

By construction, is a non-expansive pseudometric. We can also adapt the notion of soundness 
to pseudometric; p is said to be a sound pseudometric on programs if p Clearly, any 

adequate and non-expansive pseudometric is sound. In the rest of this paper, we will only deal 
with pseudometrics, but for the sake of simplicity we will refer to them simply as metrics. 
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4 The Trace Distance 


The first notion of metric we study is based on traces, i.e., linear tests. This is handier than the 
context distance, since contexts are replaced by objects with a simpler structure. 

4.1 Definition 

A trace s is a sequence in the form @ldi • • • -@14, where Vi, • • • 14 are values, and we note Tr the 
set of traces. In other words, traces are generated by the following grammar: 

s ::= e I ■ s 

We define the probability that a program accepts a trace inductively on the length of the trace, 
as follows: 


Pr{Xx.M, e) = 1; 

Pr{Xx.M, • s) = Pr(M{V/x}, s); 

Pr{M,s) =^IMI{V) ■ Pr{V,s) if M ^ V. 

Please observe that the probability that a term M accepts a trace s = @Vi • • • @14 is the proba¬ 
bility of convergence of MVi • ■ • 14- We are now going to define a metric, based on the probability 
that programs accept arbitrary traces: 

Definition 6 Let M, N be two programs. Then we define the trace distance between them as 
5*'~{M,N) = sups|Pr(M,s) — Pr{N,s)\. One can then define trace equivalence and the trace 
preorder, in the expected way. 

Please observe that d*'' is a pseudometric on programs in the sense of Definition [H and that it is 
an adequate one. The kernel of d*’’ is nothing more than trace equivalence. 

Example 2 D) = !.• we have to find a trace that separates them as much. It is enough 

to consider the empty trace: it holds that Pr{e,I) = 1, and Pr{e.,Ll) = 0. The trace distance 
©D, J) between / ©D and I is Showing that it is greater than ^ is easy: it is sufficient to 
consider the empty trace. The other inequality, requires evaluating, for any trace s, the probability 
of accepting it. This is however much easier than dealing with all contexts, because we can now 
control the structure of the overall program we obtain: for any trace s = @Vi • • • @ 14 ) we can see 
that: Pr{I © D, s) = 5 • X) I^i ''' Pi'ilj s) = S I^i''' • ^^6 difference (in absolute 

value) between Pr{I © D, s) and Pr{I,s), then, cannot be greater than 

The trace distance and the context distance indeed coincide, as well as the trace and context 
preorder, and the trace and context equivalence. In the rest of this section, we will give the details 
of the proof for the pseudometric case, but the proof is similar for and . It is easy to 

realise that the context distance is a lower bound on the trace distance, since any trace @Vi • • • @14 
can be seen as the context [-Jl/i • ■ • I 4 . 

Lemma 2 < d**" 

Proof. For any trace s = @Vi • • • @14 which separates M and TV of e, we can easily construct a 
context which separates them of the same quantity: just take C = [-jVi • • • 14- □ 

4.2 Non-Expansiveness 

Are there contexts that can separate strictly more than traces? In order to show that it is not the 
case, it is enough to show that is non-expansive: 

Theorem 1 Let be M and N two programs, and let be C a context. Then S*'^{C[M],C[N]) < 
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Since 5*'' is adequate, we can conclude that trace metric and context metric actually coincide: 

Theorem 2 

The rest of this section is devoted to an outline of the proof of Theorem [TJ The proof we give 
here is roughly inspired by the proof of congruence of trace equivalence for a non-deterministic 
A-calculus [5]. The overall structure of the proof is the following: we first express the capacity 
of a program to do a trace by means of a labelled transition system (LTS in the following) 
whose states are distributions over programs. Then we consider another LTS where the 

states are distributions over pairs of contexts and programs that intuitively models the execution 
of C[M], but keeps the evolution of C and M apart. 

4.2.1 The LTS 

The first LTS, called has distributions over programs as states, and traces as actions. We 
indicate with the transition relation associated to We’re in fact going to define it on 
top of an auxilliary labelled relation —>■. Intuitively, the idea behind —>■ is to consider a term as a 
process who can make actions. There are two kinds of possible actions: an internal action r, which 
corresponds to the internal reduction of the term, and external actions, which corresponds to the 
application of an argument V to the term. More precisely, this labelled relation —>■ is defined as a 
subset of the set Distr (P) x M x Distr (P) , where the set A is defined as: 

Definition 7 We define the set of actions A by: 

A = {r} {@T I V value } . 

Intuitively, a r-step corresponds to an internal computation step for any term in the support of the 
distribution, while a @Id-step corresponds to an interaction with the environment, which provides 
V as an argument. 

Definition 8 We define a labelled transition relation —^-C Distr(P) x Ax Distr(P), by the rules 
of Figure\^ (We write ^ + S’ for Qt S when we want to insist on ^ and S to have disjoint 
supports). 


M 

IS value distribution 

S! + a ■ {M^} -A S> + a ■ S’ 

^(Ax.M) • {M{V/x}^} 


Figure 2: One-step Trace Relations on Program Distributions. 

The relation is defined as the accumulation of several steps of —We define now the LTS 


Definition 9 We define the LTS as: 

• Its set of states is Distr {P) 

• Its set of labels is the set of traces Tr. 

• Its transition relation A- is defined by by the rules of Figure 0 

Please observe that these relations are not probabilistic. The relation A is non-deterministic, 
since at each step we can decide which term of the distribution we want to reduce. However, —>■ 
is strongly normalising and confluent. 

Lemma 3 The relation: A is strongly normalising 
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Figure 3: Small-step Trace Relations on Program Distributions. 


Proof. • We show first that it is terminating: for a term M , we define a quantity | M | € N 
which corresponds to the size of the term: 

|D|=0; lx 1 = 1; |Ax.M| = l-b|M|; 

I MN | = |M|-b|lV|; |M©7V| = l-t max{| M |, | TV |}; 

Since our A-calculus is linear, | M \ decreases during the execution for every program M. 
More precisely: If M ^ then for every N G S(^), | TV | < | M |. (It is easily checked by 
observing the rules of — 

Moreover, if TVf —then the cardinal of S{^) is at most 2. So, if for a distribution ^ we 
note: 

1 ^1= 3l“l 

(M)es(®) 

, we can see that: for every if ^ A S’, we have that | ^ | < | ^ |. 

• Moreover, let be a distribution over program, and let be S’ such that ^ ^ S, and 
S is a normal form for Then we are going to show by induction over n G N that 

^ ■ I-^l- 

— if n = 0, then 3! = S, and moreover ^ is a distribution over values. So the result holds. 

— if n > 0, it is a consequence of rules of Figure [H 

□ 


Moreover, we can show that is in fact deterministic. That is, we have the following Lemma: 

Lemma 4 For every trace s, for every 3, there exist an only one S such that 3 ^ S. 

The interest of the relation is that it gives an alternate formulation for the probability that a 
program succeeds in doing a trace: 

Lemma 5 Let be M a program, s a trace, and let be S the distribution such that {M^} S. 
Then Pr{s,M) = Y.^- 

In fact, the labelled transition system allows us to extend the notion of probability of success 
for a trace to the case where we start not from a program, but from a probability distribution 
over program: 

Pr{s, 3) = ^ when 3 ^ S 

In the same way we extend the preorder • •, the equivalence relation • •, and the metric 5*^'^ 

to distributions. We can now use the relation => to give an equivalent formulation of Theorem [TJ 
if M and TV are such that 6*”’{M,N) < e, then for every trace s, and context C, if {C[TIL]^} ^ 3 
and {C[TV]^} S, then it holds that — £■ This statement, however, cannot be 

proved directly, yet, because the way C and the argument terms interact is lost. 
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4.2.2 The LTS 

It is then time to introduce our second LTS, called ^cxPj which will allow us to relate {C[M]^} • 

to the behaviour of M: we want to talk about the evolution of a system consisting of the program 
M and the environment C, while keeping the system and the environment as separate as possible. 
C X P is the set of pairs of the form (C, M), where C is a context and M is a program. The states 
of TcxP distributions over C x P, and the labels of are traces. The transition relation 

of TcxP corresponds to the transition relation of C*''^, where we keep the information about what 
part of the whole system is the program, and what part is the environment interacting with it. 
We’ll use the following notation, which will be useful in the formal definition of T^^p: If is a 
term, 2l a distribution over C x P, we define ^ ■ N and TV ■ ^ as the distributions over C x P 
given by: 


Si-N= ^{C,M)-{{CN,Mf} 

(C,M)gS(®) 

N-^= Y ^{C,M)-{{NC,Mf} 

(C,M)gS(®) 

And if C is a context, M a term, and ^ a distribution over P, we define (C^, M) and (^C, M) as 
the distributions over C x P given by: 

{C&,M)= Y ^{M{{CN,M)^}) 

Nes{^) 

{9C,M)= Y 
Nes{&) 

If (C, M) £ C X P is such that C[M] is a value, we say by abuse of notation that {C,M) is a value. 
If ^ is a distribution over C x P such that every {C,M) £ is a value, we say that ^ is a 

value distribution. 

Definition 10 We define as the labelled transition system such that: 

• its set of states is the set of probability distributions over C x P. 

• its set of labels is the set of traces. 

• its transition relations =^cxp is defined by the rules of Figure^ The definition uses an 
auxiliary one-step transition relation S) Acxp ^, where a £ A, and S are distributions 
over C X P. 


Lemma 6 The relation on distributions over C x P is strongly normalising, and normal forms 
of are value distributions. 

Proof. The proof is exactly the same that for the relation A for distribution over programs. We 
extend the definition of | ■ | to distribution over C x P, by: | ^ | = J2{c m)gS(®) and we 

do the same reasoning. □ 

For ^ a distribution over C x P, we note the normal form of ^ for the relation A. Please 
observe that Lemma |6] implies that for any distribution there exists only one distribution S’ 
such that ^ S', and moreover S = Q}* . The trace semantics for distributions over C x P allows 
us to extend the notions of trace equivalence, trace preorder and trace metric on distributions over 
C X P in a natural way. 


II 


_ N ^ S' _ _ 

^ + p- {(Af,M)i} 4cxP ^ + P-{S,M) p- {{\x.N,Mf} ^cxP P- {N{V/x},M) 

{(C, M)^} ^cxP ^ 

S’ + p-{{CN, M)i}4cxP S’ + p-{S-N) 

C[M] is a value N ^ ^ g 

& + p- {{CN,M)'^} ^cxP & + p-{CS,M) 9 + p- {(NC,Mf} Aqxp 9 + p-{SC,M) 

{(C, Mf} 4cxP ^ _ M _ 

9 + p- {{VC,M)^} 4cxP 9 + p-{VS,M) ^ + P- ^CxP 9 + p-{\-],S) 

_ C [M] value 

9+ p-{[■]¥,\x.N^} 4cxP 9 + p-{[-\,N{V/xY} 9 + p-{(Xx.N)C,M^} ^cxP 9 ^ p ■ {N{e/x}, 


p-{\x.C,M^} 


@V 


' CxP 


p-{{C{V/x},M)^} p ■{([.], Ax.M)i} 


®v 


' CxP 


p ■{{[■], M{V/x})^} 


1> ^ 

^ C X P 


^ value distribution 
^ =^CxP ® 


Eii^i^cxpEi^i 

9 ^cxP ^ ^ =^cxP ^ 9 "^cxP S ^ =^cxP ^ 

^ =^CxP ^^*CxP-^ 


Figure 4: small-step trace relations on distributions over CxP 


4.2.3 Relating and -Ccxp 

Intuitively, considering a semantics for distributions over CxP allows us to separate the part of 
the semantics which talks about the program, and the part which talks about the context. We 
would like to obtain the trace semantics for C[M], just by looking at the semantics of (C, M). We 
are going to express this idea by relating the two trace semantics. 

Lemma 7 Let be M a closed term, C a eontext and s a trace. Let be and S such that {C[M]^} 
and {(C, M)^} ^cxP ^ Then = 'Y^S. 

Proof. The proof of Lemma [7] is relatively technical, and is based on three auxilliary lemmas : 
Lemma |51 and Lemma IHl If ^ is a distribution over C x P, we call F{^) the distribution obtained 
by filling each context by its associated term. To express this idea more formally, we define an 
operator F() on distributions over CxP, which transforms every distribution in its corresponding 
distribution over terms. 

F(^) = ^^((C,M)).{C[M]i}. 

CM 

We can now express the correspondence between the trace semantics on distributions over pro¬ 
grams, and the trace semantics on distributions over C x P, by the following lemma. 

Lemma 8 Let Si, S be distributions over CxP. If S =^cxp then we have that: F{S) F('^) 

But we would like to have some information in the other directions too: if we have the trace 
semantics of the term C[M], is it possible to deduce something about the trace semantics of 
(C,M) ? The following lemma give a positive answer: 

Lemma 9 Let S be a distribution over CxP such that F{Sl) . Let be such that S 4>cxp 
Then ^ = F{g). 

Proof. We need first to show an auxiliary lemma, in order to express the correspondence between 
the one-step relation on distributions over programs, and the one-step relation on distributions 
over CxP. 
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Lemma 10 Let be C a context, and N a term. Let be ^ such that C[-/V] —>■ &. Then there exists 
S' such that: {{C, N)^} ^cxp S, andF{S) = ^. 

Using this lemma we are now going to show Lemma[Sl The proof is by induction on the derivation 
of F(^) 4 

• The basic case is the case where F(^) is a value distribution (and consequently a normal 
form for ■ 4 ■), and where we are interested in the empty trace. The the derivation tree of 
F{^) 4 is of the form: 

F(^) value distribution 

F{S>) 4 F(^) 

Then F(^) is a value distribution. By definition of values for distribution over C x P, it 
means that ^ is a value distribution too. And so we can observe that ^ 4cxp and the 
result holds. 

• The first induction case is the case we don’t start from a value distribution. Then the 

derivation tree of F(^) ^ is of the form: 

F(^) 4 ^ ^ 4 ^ 

F(^) 4 .F 

The only possible way to have obtained: F{^) 4 is to have used a derivation of the form: 

_ M ^ y _ 

F(^) =jr+p- {(M)4 4 ^ = 4^ + p ■ (4) 

Since F(^) = JS + p ■ ^ = J \p-X, with F{/) = and F{X) = {M^}. 

So for any {C,N) G S(4^), we have that C[N] —i> .y. By Lemma fTOl there exist .ifc.Af such 
that F{SSc,n) = 4, and {{C,N) } 4cxp SSc,n- And now we can see by the rules of trace 
semantics for distributions over C x P that: 


^ -^-CxP. -^CxPa^+P- ^ -^(C.N) 

(1) 

(c.Af)eS(jr) 


Moreover, we can see that F{^ + p ■ 7 V)eS(jr) -^(C.n)) = + p ■ 

we can apply the induction hypothesis, and we have that there exists a 

(4) = ^. So now 
distribution S such 

that: 


JSp- ^ 4(c_Ar) 4cxP 

(2) 

(c.Af)gs(jr) 


F{S) = SF. 

(3) 


And now we can conclude (by equations (P) and @ ) that S' 4c xP S, and so the results 
holds. 


• The second induction case is the case where we start from a value distribution, and we are 
interested in a non-empty trace. Then the derivation tree of F{tS) 4 ^ is of the form: 

F{9) ^ 4 ^ 

F(^) ^ 

The only possible way to have obtained: F(^) is to have used a derivation of the form: 

F(^) value distribution 
F(^) = Y. F{9){Xx.M) ■ {M{V/xY} 
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For every {C,N) G S(^), let be M^c,N)Snch that C[N] = Xx.M(^c,n)- Using this notation, we 
can now express as a sum over the support of the distribution 

Y. ^aC,N))-{iM^c,N){V/x})^} ( 4 ) 

{C,N)eS{S>} 

We are going to define a distribution over C x P for every {C,N) in the support of 

We can see that for every (C, N) G S{S>), we have two possible cases: 

- Or C = [•], and N = Xx.{M(^c,n))- Then let be = {{.[], M{c,N){V/x}f] 

— Or C = Xx.V, and C[N] = M(^c,n)- Then let be J^(c,n) = {(T’{U/x}, Please 

observe that, since the calculus is linear, V{V/x} is indeed a context. 

Now we can write the equation the following way: 

^ = Fi Y n{C,N))-^^c,N)) ( 5 ) 

(c,Af)eS(®) 

Moreover, for every {C,N) G S(^), we have: {C,N^} '^cxp ^(c,n), and so the rules of 
one-step trace semantics for distribution over C x P allow us to say that: 

^^cxp ^ !^{{C,N))-M’^c,n) (6) 

(c,Af)eS(®) 

By applying the induction hypothesis to ^ and using equation ®, we know that there 
exists y such that: 


Y, y{{C,N)) ■ J^c,N) ^cxP y ( 7 ) 

(C,w)eS(®) 

and F{y) = y (8) 

And now we can conclude by using the rules of trace semantics for distributions over C x P 
that y '^=^^cxP y and since we have equation dH) the result holds. 

□ 

□ 

4.3 £-parents distributions 

Lemma [7] allows us to give yet another equivalent formulation of Theorem [1] if S^^{M,N) < e, 
then if {(C, M)^} 4>cxp ^ and {(C, N)^} 4>cxp it holds that ^ < £■ We are in 

fact going to show a stronger result, which uses the notion of e-related distributions: 

Definition 11 We say that two distributions ^ and y over C x P are e-related, and we note 
y if there exist n G N, and Ci, distinct contexts, pi, ...,p„ positive real numbers with 

'YhiPi — o,nd S[, distributions over P, such that: 

• ^ = 'l2l<i<nPi ' 

• y = '^l<i<nPi ■ 
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Please observe that, if TV) < e, then for every context C, the distributions {(C,M)^} and 

{(C,iV)^} are £-related. In fact, the notion of e-relatedness is a way to capture the idea of a pair 
of distributions over C x P representing the same environment, in which we put programs which 
are close for the trace pseudometric. The following can be seen as a stability result: if we start 
from e-related distributions, and we do a trace s, we end up in two distributions which are still 
^-related. 

Lemma 11 Let be S' distributions over C x P, and e G [0,1] such that =p“’' S. Let be s a 

trace. Let be ^ and ^ such that: =^cxp ^, o-'nd S 4>cxp ^■ Then ^ =1°“^ ‘S 

Proof. If is a distribution over P, we note the distribution such that ^ =l> tS*. Please 
observe that it is the normal form of ^ for the transition relation • —> Similarly, if ^ is a 
distribution over C x P, the distribution such that & 4>cxp ■ We are first going to show 
two auxiliary lemma: 

Lemma 12 Let be S two distributions over C x P such that S'. Then S* . 

Proof. We will use Kr{a, b) as an integer being 1 if a is equal to b, and 0 otherwise. Let be 

any distribution over C x P. We note 

^max = max{n | ^cxp 

We are going to show the lemma by induction on: n = max (nniax(f^),Pmax(<^))- 

• If n = 0 then and S = S*, and the result holds. 

• If n > 0: Then we have: there exist pi, ...,Pn, and Si ,..., Sn, and S[,..., S^^ such that: 

^ ^ Pi - {Ci,^i) 

l<i<n 

s = ^ Pi- {Ci,Si) 

l<i<n 


Then there exists i such that: there exists M G U S(^i), such that Ci[M] is not an 

irreducible term. We consider every possible case for the form of Ci [M]: 

— or Ci is an evaluation context, and there exist M G S(t^i) U S(<^i), such that M is not 
an irreducible term. Intuitively, we want to reduce and Si as much as possible, 
since they are in evaluation position. And the two resulting distributions should be 
again e-related distributions. More precisely, for every M G S(^)i U S((o’)i, we note 
. Then the rules of AcxP allow us to see that there exist ki and ^2 
such that ki + k 2 > 0, and: 



+ Pi{Ci,'^ ^i{M) ■ ^m), 
jiii M 


and 

S —^-cxP S' = ^^Pj • (CjjSj) + Pi(Cj , Si{M) ■ 

jGi M 

We can easily show that S', and moreover, max (ni„ax(^0)''^max(^O) < 

max (nmax(^)) Pmax(<^’))- So we can apply the induction hypothesis, and we have that 

=par £'* 
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If Ci is such that the reduction depends only of the Ci, that is if there exist qi, 
such that for every term N, 


Ci[N] —>■ <71 • DilN] + ... + qm ■ DmlN]. 


Then the rules of Acxp allows us to show that there exist ki and k 2 , such that 

fci + ^2 > 0: 

^ ^ cxP = ^2pj ■ (Cj, S'j) + Pi ■ ^ qk'i'DkT^i) 

and that 

S —^CxP*^ = ^ '. P3 ' i^j ^ '^j) Pi ' ^ ^ Qk'{T^k,^i)- 

In the definition of £-related distribution, we consider contexts (Cj)j disjoints. So since 
we want to show that the new distributions we have obtained are still £-related, we 
have to regroup the identical contexts (for instance, it can be the case that: Vk = Cj): 
We note C = {{Cj)j^i U {'Dk)i<k< 7 n} the set of all contexts that can have been obtained 
at this step. For C € C, we take his total probability: Kr{C,Cj) ■ pj + 

Y.iKkKTn^^i^i'^k) - Pi ■ qk, and similarly: 

Jiii l<k<m 

, and 

4 = ^iFr(C,C,)-^-4+ ^ 

j^i l<k<m 

And now we have 

^'=YPciC,^i) 

cec 

, and similarly: 

= Yx Pci^’ ^c)’ 

Cec 

and for every C G C, < e. 

The last case is the case where the term and the context really interact: more precisely, 
^i is a value distribution, and moreover: 

* Either Ci = T>[[']y], which means that we are in the case where the contexts pass 
values to the program. Then the following facts are derivable with the rules of 

T 

—^-Cxp: 

^ "^CxP = Yp^ ' (O’ 

+p^■{'D,Y &i{Xx.N) ■ {N{V/x}^}) 

Xx.N 

and 


S ^CxP — 'Y.Pj ' (Cj, <^j) 

Oiti 

+ p, ■ {V, Y <^i(Ax.fV) • {N{V/x}^}) 

Xx.N 
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We are now going to show that '3' and S' are e-related. We should again regroup 
the identical contexts (for instance, it can be the case that: V = Cj): We note 
C = {{Cj)j^i U {V)} the set of all contexts that can have been obtained at this 
step. For C € C, we take p'^ his total probability defined as p'^ = '^j^i Kr{C,Cj) ■ 
Pj + Kr{C, T>) • Pi, and we obtain: 

+ Kr{C,V) • 4 • y ^^{Xx.N) • {N{V/xf} 

Pc tiv 


and 

4=^iFr(C,C,)-^-4 

Pc 

+ Kr{C,V) • ^ ^ S,{\x.N) ■ {N{V/xY} 

Pc Xx.N 

We have that: Vj ^ i, = ^j,S'j = Sj) < e by hypothesis, and moreover, for 

every trace s: 

E 9^{\x.N) ■ {Af{t^/a;}^} I ,s) 

\\x.N ) 

- Pr( ( E • {N{V/x}^}\ , s)| 

\xx.N / 

= 1 E 9!i{\x.N) ■ Pr{N{V/x},s) 

Xx.N 

- E S,{Xx.N) ■ Pr{N{V/x},s)\ 

Xx.N 

= E %{Xx.N) ■ Pr{\x.N),@V ■ s) 

Xx.N 

— E ^i{Xx.N) ■ Pr{Xx.N),@V ■ s)| 

Xx.N 

= \Pr{9i,@V-s) 

- Pr{Si,@V ■ s)\ 

Since the relation •) < e on terms distribution is stable by convex summations, 
the result holds. 

* Or Ci = Then the rules of -^cxP allows us to show that there exist fci 

and k2 with fci + > 0, and such that: 

^ "4 CXP ^' = E ) + p^ • 

ji^i 

and: 

S —^-cxP = 'y^.Pj ■ {Cj,Sj) + Pi ■ {N{\C\lx},Si) 

o¥^i 

, and we can easily see that 9' =§“ S' . 


□ 
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Lemma 13 Let he e > Q. If ^ and S are two value distributions over C x P (and consequently, 
in normal form for ■ ^ with & =par g^ then for every V, there exists with =pa’' ^ 

such that ^ o,nd S ^cxp ■ 

Proof. By hypothesis, we know that ^ ^, and so we can write ^ and S as: 

^ and S'= '^pi ■ {Ci,S’i) 

i i 

and Mi, 5^^{^i, Si) < e 

When is a term distribution in normal form (i.e with value or non-reducible terms), we note 

&{Xx.M) ■ {M{V/xy} 

Xx.M 

and S(= Y <^(Ax.M) • {M{V/x}^} 

Xx.M 


And we have 

^^cxp^= E Y 

i\Ci = [-] i\Ci=Xx.Vi 


, and similarly: 


<r^cxp^= E ^^* •([•]><)+ E P^■{^^^{V|x},Si) 

i\Ci = [-] i\Ci=Xx.Vi 

, and we can see that <e. □ 

We can now use these two auxiliary lemma in order to prove Lemma fTTl The proof is by induction 
on the length of s: 

• : if s = e, then we have that ^ = tS*, = S* and we have that ^ by Lemma WR 

• if s = @1/ ■ t. Let be JS, such that S'* "^cxp MS’ and S* '^cxp MS. We have (since 
4* is confluent ): 


=>CxP 


—>CxP ■Sc ^ J' . 


and 




Then by Lemma [T^ we have: Qi* S*. Now we can apply Lemma[T31 and we obtain that 

,SS . And now we apply the induction hypothesis to t, and we obtain that 


□ 


4.3.1 Proof of Theorem [TJ 

We can now see that Theorem [T] is a direct consequence of Lemma [TT] Indeed, let M and N 
be two programs at distance at most e for the trace metric, and let ^ and S be such that 
{C,M^} 4>cxP and {C, A^^} 4>cxp Then, as we have already observed, {C,M^} and 
{C,iV^} are e-related. By Lemma fTTl we can deduce that ^ and S are e-related. And it is easy 
to see that it implies that ^ ^ < e. 
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4.4 Adding Pairs to the Calculus 

The trace distance and the results we have just presented about it can be extended to an affine 
A-calculus with pairs, namely a calculus whose language of terms also includes the following two 
constructs: 

M ::= {M,N) | let {x,y) = M in N. 

We assume that terms are typed in any linear type system guaranteeing the absence of deadlocks 
(e.g., simple recursive types), and we add the following rules to the big-step semantics: 


M JJ. ^ {L K ]\.^k){l,k)^s(^) N{V/x}{W/y} Sv,w 

let {x,y) = M in TV JJ. • ^l{V) ■ • S’v,w 

We would now like to extend the definition of a trace to pairs accordingly: which action should 
we perform on a term in the form {M, TV)? The naive solution would be to add projections to the 
trace language: s ::= tti ■ s | 712 - 5 , with trace interpretation extended in the expected way: 

Pr{{M, N),tti ■ t) = Pr{M,t) 

Pr{{M,N),TT 2 -t) = Pr{N,t) 

However, this way the trace distance would not coincide with the context distance, anymore. 
Indeed, let us consider the following example: 

Example 3 We are going to compare the following terms: 

M ■- {Xz.{I®n),Xz.{I®n)); N := {Xz.I,Xz.I). 

These two terms are at context distance at least since we can consider the context C := 
let {x,y) = [•] in {xl){yl), and we can see that X) = I? while X 

we cannot find any trace that separates them more than The interesting case is when s = ni ■ t. 

But then: 


|Pr(M, s) — Pr{N, s)| = \Pr{Xz.{i} © J), t) — Pr{Xz.I, t)| 

< S*’^{Xz.(n®I),Xz.I). 

And it is easy to see that in the calculus with pairs we still have S*'^{Xz.{Xl © J), Xz.I) = 

The reason why we cannot recover the context distance by way of projections is that the let 
construct above allows us to access both components of a pair, and the distances each of them 
induce can add up. A way out consists in extending the trace language to pairs really following 
linearity, and considering a new action in the form ©M with the following extension of trace 
interpretation: 

Pr{{M,N),®L-t) = X^|IM](y) ■ [AfKVK) ■ Pr{L{V,W/x,y},t) 
v,w 

Please observe that we could in fact express the pairs in the original language [2] : let us consider 
0 : Terms defined by 


0((M,A)) := Xx.xe{M)Q{N) 
0(let {x,y) =M in N) := Q{M) {Xx.{Xy.Q{N))) 
Q{Xx.M) := Xx.Q{M) ■ ■ ■ 
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Moreover, we could see that every trace for the language can be seen as a trace in the original 
language: We can extend 0 : Tr{A^J) -^Tr, by: 

0(e) = e 

0 (@t/. s) = . 0(s) 

■ s) = Xx.Xy.M ■ 0(s) 

and we have for every term M G and for every trace s G Tr{A^^), 

Pr{M, s) = Pr{e{M),Q{s)) 

This way of handling pairs allows the trace distance and the context distance to coincide, 
again. However, the trace distance loses its grip with respect to the context distance. Consider, 
for instance, the terms M and N from Example |31 Showing an upper bound on the distance 
between M and N is the same thing as showing an upper bound on i5*''(L{Az.(r2 © I),Xz.{Xl © 
I)/x,y},L{Xz.I, Xz.I/x,y}) for all terms L such that x,y\-L, which is in fact not far away from 
what we should show if we were considering the context distance directly. 

5 The Bisimulation Distance 

As we realised in the last section, the trace metric can be a way to alleviate the burden of evaluating 
the context distance between terms but, in particular in presence of pairs, its usefulness can be 
limited. In this section, we will look at another way to define the distance between programs 
which is genuinely coinductive, and based on the Kantorovich metric for distributions. 

5.1 Definition 

A labelled Markov chain (LMC) is a triple ^ = (.S', Jjf, where S' is a countable set of states, 
Af' is a countable set of labels, and is a transition probability matrix, that is a function: 

: S X Af ^ Distr(S). Moreover, if the image of only consists of distributions with finite 
support, we call ^ an image-finite LMC. We are now going to define, in a similar way to m 
(but in absence of non-determinism), the metric analog to bisimulation. The idea is to define a 
metric on the set S of states of the LMC as the greatest fixed point of some monotone operator 
on metrics. Please recall that (A(S'), <™®t'') is a complete lattice, and so any monotone operator 
has indeed a greatest fixed point. 

Lifting Metrics to Distributions 

We are going to define a way to turn any premetric over a set S into a metric over finite distribution 
over S. 

Definition 12 Let y, be a premetric on a set S. We define the lifting of y as the metrie on the 
set of finite distributions over S defined by: for every , S finite distributions over S, y{3i,<o) is 
the optimum solution to the following linear program: 

min E ^i,j ' ^j') E Wi -G E^. 

*,i * 3 

subject to hij + Zj = 

Jf. hij +Wi = 

Zj, Wi > 0. 

Please observe that this linear program has an optimal solution. We can make use of the notion 
of duality from linear programming, and obtain an alternative characterisation of lifting: 
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Theorem 3 Let ^ he a premetric on S and Let ^, S’ he finite distributions over S. Then: 

niS>, = max S^{s) + bsS’is) 

S 

subject to Vs G S, as < 1; 

Vs eS,bs < 1; 

Vs, t & S, Ob + bt < t). 


Proof. By strong duality theorem in linear programming. 


□ 


We would like to have the lifting of a metric pt behaving coherently with pt itself. If we know 
the lifting of pt, we should first of all be able to recover pi by considering Dirac distributions: 

Lemma 14 Let pi he a premetric on S, and let s,t€S. Then /x({s^}, {t^}) = pi{s,t). 

Proof. Let he s,t € S, and let be = {s^}, S = {t^}. Then we can see that: 


pi{^, S') = max ■ 


• S{u) 

with Vu G S', a„ < 1 A 6„ < 1 

and Vui, ^2 € S, Ou^ + 6„2 ^ l^iui,U2) 


= max ■ 


= Lis,t) 


Os + bt 

with Os < 1 A 6t < 1 
and Os + bt < pi{s, t) 


□ 

If a premetric on states verifies the triangular inequality, its lifting verifies the triangular inequality 
too, which is a consequence of the following lemma: 

Lemma 15 Let pi, p, v be three premetrics on S, such that\/s,t,u G S, pi{s,t) < p{s,u) v{u,t). 
Let St, S, be finite distributions over S. Then p{S, ,S) < p{S, S) + i'{S, iS). 

Proof. Let be S, S, finite distributions over S. We’re going to use the minimum-based defi¬ 
nition of lifting: Let be e, i such that: p{S,S) = e and v{S,j^) = i. By assumption, there is a 
finite number of states which appear in the union of the support of every considered distributions. 
We numerate these states between 1 and n. 

Then let be {li,j)i<i,j<n, {xi)i<i<n, iyj)i<j<n the coefhcients for which the minimum of the op¬ 
timisation problem associated with p{S, S) is reached. They verify the following equations: 

^i,3 ' ^j) + •t'i + Vj 

Vj, ki + Vj = 

+Xi = S{Si) 

Vi,j : li^j,Xi,yj ^ 0 

Similarly, let be ihij)i<ij<n, (wi)i<i<„, (zj)i<j<n the coefhcients that reach the minimum 
for the optimisation problem associated to v{S,.S^). They verify the following equations: 

{ t = '^j^k '^j tttj + 

Vfc, J2j hj,k + Zk = ^{sk) 

Vj, Efe hj,k + Wj = 

'dj,k . hj k,UJj,Zk ^ 0 
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We want to show that <£ + /,. In order to do that, we would like to have coefficients 

bk which verifies the constraints of the optimisation problem associated with 
and such that the objective function is bounded by e + i. That is, we would like to have: 


E . rii k + bk = .^{sk) 

I 

+ ai = Si{si) 

^ rii^k ■ sfc)+< 


(9) 

( 10 ) 

£ L 

( 11 ) 


In order to achieve that, we define the a^, on the following way: 

^i,j ' b,j^ h 


E H,j ’ 

~F{ 


^{Sj) 


ttj = Xj 

3 

bk = Zk + 


k,j ■ wj 

S{sj) 

hj,k • zj 
(S’isj) 


where we have adopted the following notation: if S’isj) = 0, then hj^k = 0 = hj, and then the 
meaning of is 0. 

Now we are going to show that this choice of coefficients gives us what we wanted to have. We 
first verify that equation m holds. Indeed, we have that: 


y]n,.fc+a, = y]y] 


lij ■ hj, h,j ■ Wj 


k j '^(^i) 

Efc bj,k + Wj 


— Xi T ^ ) lij 


S’(sj) 


We can verify in a very similar way that equation holds, that is: = =^(sfe)- 

We are now going to verify that equation (ED holds. Indeed, we have that: 

^ ^fc) H” ^ ^ d" ^ ^ 

i,k i k 


< 


< 


y^ y^ h,.i-hj,h 

A^i,k 2^j S{sj) 

- ■ fJ,{Si,Sk) 

+z)i(^i+■ 

li,3-wj\ 

> 

+ Z)fc (^fc + Yjj 

hi.k-zj 'i 

Z-^i,k 1-1 j g(sj 

^ ^k'] 

+ + Z)j ■ 

h.j-wj^ 

g(s3) ) 

+ Z)fe(^fe + 

hi.k-zj \ 

g(s3) > 

f h,j-h:j 
Z^i,k 2^j ^{sj 

) (p(Si,Sj 

+ Z)i(^i + Hj ■ 

li.i-wj', 

S{sj) / 

+ Z)fc(^fe + Hj 

hj.k-zj\ 

g(s3) > 
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< 


< 


J2i,j ' P{^i^ ^j) ( ) 

+ J2j,khj,k-iyiSj,Sk) (5tsjf) 
+ E^X^+Y.JWJ (5^) 

+ J2k + J2j ) 

j ^j) 

+ iZj,kkj,k- v{Sj,Sk) 

+ E* a;* + Ej Wj + Efc + Ej 




□ 


Metrics as Fixpoints 

In a non-probabilistic setting, a relation i? is a bisimulation if every pair of states s, t such that 
s Rt can do the same actions and end up into states which are still bisimilar. More precisely,for 
every action a, and for every state u such that s u, there exists v such that t v, and uRv. 

In order to obtain a quantitative counterpart of the scheme above, we define an operator F 
on the set of metrics over the states of a LMC: intuitively, given a metric /i, we define a new 
metric F(/i) which corresponds to the distance obtained by first doing a step of the transition 
relation, and then applying the lifting of ^ to the resulting distributions. More precisely, let be 
two states s and t: F{^){s,t) is computed in the following way: for every action a, we consider 
the distance (with respect to /r) between the behaviour obtained from s after doing the action a, 
and the behaviour obtained from t after doing the same action a, and then we take the maximum 
over all action a of those quantity. 

Definition 13 Let ^ = (S', .if, be an image-finite LMC. We define an operator F on A(S) 
as 


F{fj,){s,t) =sup{fi{3^{s){a), ^{t){a)) \ a e .if}. 

Theorem 4 For any image-finite LMC .M, F has a maximum fixpoint. We call it the bisimula¬ 
tion metric for the LMC and we note it 6^ 

Bisimulation Metric and the Afiine A-Calculus 

We are now going to consider a specific LMC which captures the interactive behaviour of 

our calculus. 

Definition 14 We define the LMC = (S^, .if^, where: 

• The set of states is defined as follows: 

= PW V, 

A value V in the second component of is distinguished from one in the first by using the 
notation V. 

• The set of labels is taken to be 

if A = I Va value} |J{ eval}. 

• The transition probability matrix is such that: for every M G P, and any value V G 

S(|M|), it holds that eval){V) = |M|(F), and that for every term M such that Xx.M G 

P, and V GV, it holds that i3^^{Xx.M,@V){M{x/V}) = 1. 
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The results we have proved previously in this section apply to . In particular, one can define 
the bisimulation metric on . The bisimulation distance on programs, which we indicate , is 
defined to be the restriction of to programs. 

Definition 15 We define a metric 5^ on the set of closed terms, by: for every M, N, 

Lemma 16 For this particular LMC, we have that: 

F{p){Xx.M,Xx.N) = sup{^(M{y/a;},iV{y^/a:}) | V a value } 

F(/i)(M,7V)=^([Ml,IfVj)} 

F{p){M,V)=Q 

We can see easily that (5*^ is an adequate metric. 

Lemma 17 6^ is an adequate metric on programs. 

Proof. We have to show: for every M, N: 


□ 

But there is more, since the bisimulation metric is well-known to be a lower bound on the trace 
distance: the bisimulation distance is a sound metric. In the next section, we anyway show 
non-expansiveness for it, which is stronger. 

5.2 Non-Expansiveness 

Proving the non-expansiveness of 5'^ cannot be done directly, by a plain induction on contexts. 
Our strategy towards the result is the Howe’s technique m. a way of proving congruence of 
coinductively-defined equivalences which has been widely used for deterministic and non-deterministic 
languages, and that we here adapt to metrics. 

The idea, then, is to start from S^, to construct another metric 6^ on top of (which turns 
out to be non-expansive by construction), and to show that 6^ = d*’. We first need to transform 

our metric on programs into a metric on (potentially open) terms. Any metric p, on programs 
can be extended into a metric on open terms, which by abuse of notation we continue to call p 
and which is defined as follows 

p{M,N)= sup p{M{Vi,...,Vn/xi,...,Xn}, 

N{Vi,. . . ,Vn/xi, . . . ,Xn}), 

where xi,... ,Xn are the variables occurring free in either M or N. 

Definition 16 Let be p a metric on terms. An Howe judgement is a element of the form 
{T,{M,N),£), where T is a typing context, M and N are two terms, and e € [0,1]. We say 
that an Howe judgement is valid, and we note T h p^ {M, N) < e , if it can be derived by the 
rules of figure\^ 

Please observe that, potentially, there are several different e such that P h p^{M,N) < e. 

We are finally in a position to define the Howe’s lifting of p: 

Definition 17 Let be p a metric on terms. We define a premetric p^ on terms by: 

p^{M, N) = inf ({£ I ar, T h p^{M, N) < e} U{1}) . 
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/i(x, M) < e X, r h M 
X, r h fi^{x, M) < £ 

rh/i^(M,X)<£ A h ^^(Af,T) < 7 fi{KT,L)<L r,AhL 
r, A h n^iMN, L) < £ + 7 + i 

r^H^{M,K)<£ tx^{N,T) <L fi{K®T,L)<-f ThL 

r h © A, L) < + 7 

x,r hA) < £ fi{Xx.K,L)<t rhL 
r l- iJ,^{Xx.M, L) < £ + 6 

Figure 5: Rule for Howe’s constructor on metrics 


The following lemma says that the optimum value of £ can be reached with any typing context 
r which contains the free variables of M and N. 

Lemma 18 For every terms M, N, for every typing contexts F, and every real e such that 
T F p,^{M,N) < £, we have that: FV{M) U FV{N) C F. Moreover, for any context A such 
that {i I A h N) < l} %, then inf{t | A h N) < s} < e. 

We can see that is a premetric on open terms. Please observe that it is not necessarily a 
metric, since its construction entails neither symmetry nor the triangular inequality. 

Lemma 19 If pL is any premetric on closed terms, then is a premetric on (potentially open) 
terms. 

Lemma 20 For every terms M, N: 

< pl{M,N) 

The construction of Howe’s lifting allows us to have the two following properties: 

Lemma 21 (Pseudo-Transitivity) Let be pi a metric on terms. For every terms M, N, L: 

N) < pi^{M, L) + p{L, N) 

Proof. Let be £ such that F h p.^{M, T) < £ is a valid judgement. It is enough to show that 

F h p^{M,N) <£ + p{L,N) 

is a valid judgement.The proof is by induction on the rules of the construction of valid judgements. 

□ 


Lemma 22 (Pseudo-substitutivity) If p verifies that, for every terms M, N, for every values 
V: p{M{V/x},N{V/x}) < p{M,N). Then for every terms M,N, for every values V, W: 

p^{M{V/x}, N{W/x}) < p^{M, N) + p^{V, W) 


Please observe that, the open extension of a metric on closed term verifies the hypothesis. 


Proof. Let be £ such that: F h p^{M, N) < e. The proof is by induction on the structure of the 
derivation of F h p^{M, N) < e. 


If the derivation is: 


p{x, N) < £ 


F, X h p^(x, N) < £ 

substitutive: p{W,N{W/x}) < p{M,N) < £. Now by pseudo-transitivity of p^ 
that: p^{V, N{W/x}) < p^{V, W) + p(W, N{W/x}) < p^{V, W) + £. 


: Then M{V/x} = V. Then, since p is pseudo 

we have 
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• If the derivation is: 

Vy-^i^{T,K) <e A h ^^(C/,P) < 7 T.AhL fj,{KP,L)<L 
T,A^H^{TU,L) <e + t + 7 

We know that x cannot appear both in T and in A. Suppose for example that x doesn’t 
appear in A . Then (by Lemma ITHl) x doesn’t appear in FV{U) U FV{P). Then: We apply 
the induction hypothesis to: < e. We have: 

{T{V/x}, K{W/x}) < {T,K) + {V,W) < e + {V,W). 

Moreover, since ^{KP,L) < l, we have that (since /x is value substitutive): 

^i{{KP){W/x},L{W/x}) < L 


So now, we have that: 


r\a: h ^J.^{T{V/x},K{W/x}) < e + p"(L, W) 

Ahp"(17,P) <7 


^i.{K{W/x}P,L{W/x}) < L 


r \ a;, A h ^i^{{TU){V/x}, L) < e + 6 + 7 + /x^(F, W) 


r\a;, A I- L{W/x} 


• Other cases are similar. 


□ 

The interest of this construction is that the metric 6^^ is (more or less by construction) non- 
expansive: 

Lemma 23 (Non-expansiveness of 6^^) For every context C and for every terms M, N it 
holds that (5'’^(C[M],C[A]) < S^^{M,N). 


Proof. The proof is by induction on the structure of the context C. 


□ 


The goal now is to show that 6^^ <metr jb jg greatest fixed point of F for our LMC 

we are going to show that <5'^ can be extended into a metric on the states of obtaining 
a fixed point for the operator F. First we extend 5*^ to a premetric on S^: 


Definition 18 

by: 


We define the extension of to (that we note still 5^^ by abuse of notation), 


{V ,w) = 5^^ {v,wy, 

5’’“{M,W) = 1 . 


Since isn’t guaranteed to be a metric, we are forced to further refine it, by adding rules 
corresponding to symmetry and to the triangular inequality: we define <5*’^ over by the rules 
of Figure [H 

Definition 19 We define a valid 5^^-judgement h 5’’^{s,t) < e, where s,t G S^, e G [0,1], as 
the judgements which have a finite proof-tree by using the rules of Figure\^ 

We define 6^^ a metric over by: 


6^^{s, t) = inf {£ I h 6^^{s, t) < e} 


Lemma 24 


6 ’’ 


is a metric. 
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S^^(s,t)<e ^S^^{s,t)<e ^S^^it,s}<L 

\^S^^(s,t)<e h 5^^(s,t) < mm(£, b) 

h <5^^(s,t) < e ^ 

^ u) < £ + l 


Figure 6: Metric Closure of 




H 


We can see easily that 5^ <metr ^metr with respect to the preorder on terms. We 

want to show that 5'°^ = 5^. In order to have that, we will show that 5^^ <metr jb_ jg ^ 

direct consequence of the following theorem: 

Theorem 5 5^^ is a pre-fixpoint of F. 


Proof. We need to show that <5*’^ <metr p(^§'o^y Please remember that the preorder on metrics 
corresponds to the reverse of the point-wise preorder for states. So if we read this inequality on 
metrics as an inequality on the states of we see that it is equivalent to: for every s,t S 5"^, 
^ If unfold the definition of the operator F on metrics, we can see 

that it means that for every a € a), a)) < t). Please remember that 

there are two kinds of actions in our LMC: the action eval of evaluating a program to obtain a 
value distribution, and the action @P, which corresponds to passing the value P to a distinguished 
value. If we consider separately each of these actions, we see that the result we want to have is 
equivalent to: 

• Let be M, N closed terms. Then |iV|) < 

• Let be M, N such that x h M and x h A^, and let P be a value. Then it holds that: 

S'^^{M{V/x},N{V/x}) < 

We are first going to show these two result to the original premetric on terms and we will 
extend them later to 5 ’^a- 


Lemma 25 (Key-Lemma) Let he M and N two closed terms. Then: 


hH 


H, 


5^ ([M1,[7V|)<5'' (M,7V) 


Proof. We show in fact that, for every e such that the judgement h {M, N) < £ is a valid one, 
it holds that |A^]) < e. We show that by induction on the structure of the derivation 

of: M 1) |M1 

• If M is a value: then M = Xx.K, and the derivation of M IJ. |M]| is of the following form: 


Xx.K JJ. {Ax.Kd 

Then the proof tree allowing to certify the validity of h {M, N) < e should be of the 
form: 

xh S'°^(T,K) <-/ 5'°{Xx.K,N) <L 

5'^^{Xx.T,N) <e = -i + L 

Since is a fixpoint of F, we have that: i5'^dAx.K]|, |A^]) < S^{Xx.K, N) < l. And so: 

^‘’^([Ax.Tl, [IV]) < ^‘’^([Ax.Tl, [Ax.KD + ^MAx.K], [IV|) by lemmalU] 

<<5‘’"'dAx.T|,IAx.K|)+5 
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Moreover, since \x.T and \x.K are values, we know that: |Aa;.r] = {Aa;.r^}, and |Ax.itr]| = 
{\x.K^}. By Lemma [m we can see that: d’^^dAx.T], lAx.K]]) = 5'°^{\x.T,\x.K). It 
follows that: (5*’^(|Aa;.T|, |A^]) < {Xx.T, Xx.K) + 6 and since we have the following 
proof tree, it allows us to conclude. 

a: h S'°^(T, K)<-f S'^iXx.K, Xx.K) < 0 
h {Xx.T.Xx.K) < 7 


• li M = UL. Then the derivation of M IJ, |M]| is the following: 

ui^iuj Li^ m 

{P{V/x} JJ. [^’{l^/a;}I}Ax.PeS([(7]),yGS(|[Ll) 

MN ^ E MiXx.L) ■ imv) • lP{V/x}l 

And the proof tree corresponding to the validity of h 6^^{M, N) < e has the following 
form: 

\-S'°^{U,K) < p h(5‘’^(L,r) <7 6'°{KT,N)<l 

h 5'°^{UL,N) < e = /3 + 7 + i 

We have: 

d*’^([C/Ll, [iVl) < <5*’^([C/LI, iKTj) + d'^ilKTlN) 

<S^^aULllKTl) + L 


So it is enough to show that: d'^^(|C/L], |ArT|) < e + 7 . 
So we have that: 


J*’ ([C/Ll, iKTi) = maxj^a, • [C/Ll(s) + bslKTj{s) | a, < 1, 6 , < 1 A a« + &* < d*’ {s,t) 
= max{^(as • ■ [LKfo) • [P{fo/a;}](s) + 


p y 


• EE M(Ax.Q) • [ri(w) • lQ{w/x}i{s)) 

Q W 


H, 


as < l,bs < I /\ as + bt < 5^ (s, t)} 


We are now going to use the dual characterisation of the lifting of a metric to a distribution: 
We know that: 8'°^{U,K) < e. 

So there exist (^p,Q)AP.eS([( 71 ),AQ.eS([ifl)) and (a;p)Aa;.PGS(|[Pl )5 and ( 2 /Q)Aa;.QGS(|[ifI)j such 
that: 

E Ip.Q ■ S'^^'iXx.P, Xx.Q) + E a:p + E yQ = (12) 

AQ P Q 

E + yQ = lK\{Xx.Q) (13) 

p 

Y,lp,Q + xp = lU\{Xx.P) (14) 

Q 

Please observe that the equation (fT^ implies in particular that: J2pq^p,Q — 1- Similarly, 
the equations m and 0 implies that X)p — 1 and ^qVq < 1 
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and 


d‘’''([t/Ll,[KTl)=max{^(a,-^^ ^Zp,Q + xp • I^K^) ■ 

S P V \ Q ) 

+E E (E +yQ) ■ i^K^) • iQ{w/x}i{s)) 

Q W \ P J 

I Os < 1 A 6 s < 1 A Os + 5 t < 5 ^^{s, t)} 


= max{^ ^(lp,Q ^ as • I^K^) • [P{^/x}l(s) + 6s • m{W) • lQ{W/xms) 

S P,Q V 

+ ^ as • ^ xp ^ -imv) • lP{V/xms) 

s p V 

+ E E E • 1^1 («) 

s Q W 


H , 


I as < 1 A 6 s < 1 A Os + 6 t < 6 (s, t)} 


< max{^^(lp,Q^as • imv) ■ [P{^/41(s) + 6s • in{W) • lQ{W/xms) 

s P,Q V 

I as < 1 A 6 s < 1 A Os + 6 t < S^^(s, t)} 

+ max{^as -^xp^-1^1(1/) • lP{V/x}i{s) 

s P V 

I Os < 1, 6s < 1 A as + 6t < 5^^{s, 6)} 

+ max{^^yQ^6s • lp{W) ■ lQ{W/x}l{s) 

s Q W 

I Os < 1, 6s < 1 A as + 6t < S'^^(s, t)} 


< max{^^(Zp,Q^as • imv) ■ [P{^Ml(s) + 6 s • imw) • lQ{W/x}\{s) 

8 P,Q V 

I as < 1 A 6 s < 1 A Os + 6 t < S^^(s, t)} 

+I2^p+I2vq 

P Q 

We can now apply the induction hypothesis to 5'°^{L, T) < 7 . We obtain that: d’’^(|P|, |T|) < 

7 . So there exist (hy,w)veS([Ll).iV 6 S([Tl), and (■icy)\/eS([Ll), and {zw)w(^S(in), 

^ hv.w ■ W) + ^ ^ T) < 7 (15) 

y IV 

'^hv.w + zw = \T\{W) (16) 

V 

'^hv.w + wv = \L\{V) (17) 


v.w 


w 


And now we have: 

S^^ilULllKP) 
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< max{^^Zp^Q(^(^ hv,w + wv) ■ [P{T^/a;}](s) • 

s P,Q V W 

+ ■ lQ{^/^}\i^) ■ 

W V 

I Os < 1, &s < 1 A Os + 6t < 6'°^(s, t)} 

+ '^^p + '^yQ 

P Q 

< max{^(Zp,(3(^ hv^w^lP{V/x}l{s) ■ as + lQ{Wlx}\{s) ■ bs 

P,Q V,W s 

+ J2^v- ^[P{T^/x}](s) • as 

V s 

+ ^zw 

W s 

I as < 1 A bs < 1 A as + bt < b^^(s, t)} 

+ I 2 ^p + I 2 yQ 

P Q 


- X! ^v,wmax{y^|P{y/x}|(5) • as + |Q{VF/x}](s) ■ | < 1 A 6s < 1 A < 5'°^{s,t)} 

P,Q V,W s 

+ '^wv ■ inax{^|P{y/x}](s) • Os | Os < 1 A 6s < 1 A Cs + 6* < <5'^^(s,0} 

V s 

+ '^zw ■ max{y^|Q{VF/x}l(s) ■ 6s | Os < 1 A 6s < 1 A Os + 6* < 5'°^{s, i)}) 
w s 

+ I2^p + I2yQ 

P Q 


Now, we can use equation m, and the fact that the sum of a distribution is always lesser 
or equal to 1: 


6 '^ ([f/L], IKTJ) < lp,Q U: hv,wS'° {P{V/x},Q{W/x}) + Wy + Zyy 

P,Q 


V 


W 


y^xp + y^j/Q 

P Q 


We can here use lemma [22l which states that 6^^ is pseudo-substitutive: 


ilULl iKTj) < lp,Q E (P. Q) + W^)) + E + E 

P,Q V W 

+ J2^p + J2yQ 

P Q 
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hv,w) ■ jP, Q) + y^,xp + y^ yQ 

P,Q v,w P Q 

hv,w • {y, W) + wv + zw) 

P,Q V,W V W 


and, since '^vw ^v,w < 1 , and similarly J2pq ^p,Q — have that: 

5^^([C7L1, iKTi) < ^ /p,Q ■ Q)+'£xp + '£yQ 

P,Q P Q 

+ hv^w ■ < 5 *^ iy^ W) + Wv + zw 
v,w V w 

We can now use equations m and (HU: 

S^^ilULl iKTi) < S^^ipl [i^D + IT]) < £ + 7 


□ 


Lemma 26 

S’’^{M{V/x},N{V/x}) < S’’^{Xx.M,Xx.N) 

Proof. Let be e such that: h {Xx.M, Xx.N) < e 
The only rule that can have been applied is: 

X \-{M, K) <'y S'°{Xx.K, Xx.N) < t \-Xx.N 
h {Xx.M, Xx.N) < e = 7 + i 

We can now apply Lemmato x h {M, K) < e, and we see that: {M{V/x},K{V/x}) < 

6^^ {M, K) < e. Moreover, we know that S^{Xx.K, Xx.N) < u. Since J*’ is a fixpoint for T, we can 
see that: 

7 > 5'°{Xx.K,Xx.N) = 5'^{XyK,NyN) > 5'={K{V/x},N{Vlx}) 

and now we can conclude by Lemma [H] that: 5'°^{Mp/x}, N{V/x}) < r + 7 □ 

Now we extend these two lemmas to 5^^: 

Lemma 27 Let he M, N two terms. Then 

S^l{(Mim)<S^l{M,N) 


Proof. Let be e such that the judgement h N) < e is valid by the rules of figure El We 

are going to show by induction on the structure of its derivation that: (5'^^(|M|, |7V|) < e. We 
consider different cases depending of the structure of the proof tree used to derive the validity of 
h<5‘’f(M,7V) <e: 

• If the proof tree is: 

5^^ {M,N) < e 
'p 5'°^{M,N) <e 
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We can use Lemmaand we obtain that |-/V|) < e. Now we can see that: 

([Mi, pvi) < ^‘’''([Mi, [1vl) since 5^1 

= d*’^([M], |iV|) by construction of the extension of 6'°^ to 

< £ 

• If the proof tree is of the form: 

hdb^(M,7V) <7 hdb^(iv,M) <i 
h 5 ^^ (M, N) < e = min( 7 , l) 

We can apply the induction hypothesis to h S^^{M,N) < 7 and h S^^{N,M) < l. We 
obtain that d'^^dMI, |7V|) < 7 and that d'^^diV], |M|) < t. Since is symmetric, it 
means that: d*’^dM], |A^]) < l. And so we have the result. 

• If the proof tree is of the form: 

hd’^f(M,s)<7 b d’^f(s,Af) < <i 
h5'°^{M,N) <e = 7 + i 

If e = 1, the result holds. Otherwise, please observe that s cannot be a distinguished value. 
So there exist a closed term L such that s = L. By induction hypothesis: d’^^dM], |L]) < 7 , 
and d'^Ad^lj I-^l) ^ So by Lemma [TSj and since 6 '°^ verifies the triangular inequality, 
we have: I-^I) < 7 + i- 

□ 


Lemma 28 For every M, N: 

S’’^{M{V/x},N{V/x}) < (AdjW,AdjV) 

Proof. Let be e such that the judgement h 6^^{Xx.M, Xx.N) < £ is valid. As for the previous 
lemma, the proof is by induction on the structure of the proof tree for this judgement. □ 

□ 


Since 6^^ is non-expansive by construction, we now have the result we were aiming for: 
Theorem 6 is non-expansive. 

Proof. As a consequence of Theorem 11^5'° = . Since 6^^ is non-expansive, the result holds. 

□ 

5.3 On Full-Abstraction and Pairs 

The bisimulation distance is a sound approximation of the context distance. But how about full- 
abstraction? Is there any hope to prove that the two coincide? The answer is negative: there are 
terms whose distance is strictly higher in the bisimulation metric than in the context (or trace) 
metric. 


32 




Example 4 Consider the following terms: M corresponds to the program that takes an argument, 
and then returns I with prohahility and diverges with probability N corresponds to the program 
which chooses first between the function which return I whenever it is called, and the function which 
diverges whenever called. Formally: 

M := Aa;.(/©r2); N := {Xx.I) (B {Xx.il). 

These two terms are at distance 0 for the context distance: since the calculus is linear, the step 
where the choice is done is irrelevant. However, 5^{M,N) = the proof, use the characterisation 
of bisimulation distance by testing from in which not only linear tests, but also more complicated 
tests (like threshold tests^ are available. 

But how about pairs? Indeed, for the sake of simplicity, we have presented the metatheory of 
the bisimulation metric for a purely applicative A-calculus. Following the lines of our discussion in 
Section 2^ however, the LMC can be extended into one handling pairs in a relatively simple 
way. The difficulties we encountered when trying to evaluate the (trace, or context) distance 
between pairs of terms unfortunately remain: it is not clear whether coinduction could provide 
any additional advantage over contextual distance. As for the trace metric in the previous section, 
we would like to extend the bisimulation metric to a language with pairs. In order to do that, we 
add the action 0K to the LMC We transform the definition to the probability matrix 

by adding: 

t^^{{M,N)){®L) = • m{W) • {L{V,W/x,y}^} 

v,w 

We now have to transform the definition of validity for Howe’s judgement in order to consider the 
case of pairs: 

Th n^{M,K) <e r,AhL 

Ah/rg(jV,r) <7 fi{{K,T),L)<L 

r,Ah/r^((M,iV),L) <e + 7 + t 

6 The Tuple Distance 

The two metrics we have just defined have been shown to be non-expansive, even if the calculus 
is extended with pairs. In that case, however, they do not represent so much of an improvement 
with respect to the context distance. Please recall where the problem comes from: we would like 
to define actions starting from {M,N), and respecting the affine paradigm. We have seen that 
taking projections as actions lead to an unsound metric, and we have circumvented the problem 
by considering an action ®L, following [8]. Intuitively the action ®L corresponds to replacing the 
free variables of L (which are supposed to be included in {a;, y}) by the components of the pair: 

if for instance V and W are values, we have that (y,W) ^ {L{V,W/x,yY]. But what can any 
environment L do if we give it V and W as two values to interact with? Let us suppose that 
both V and W are functions, and remember that we are in an affine setting. The environment 
can (probabilistically) pass some arguments to V, and independently some other arguments to IF, 
and then possibly pass to one of the two programs an argument that contains the other one. The 
idea behind the construction we present in this section, then, is to keep the information about the 
two components of the pairs in the states until they really interact with each other. 

Our idea can be made concrete by introdncing another LMC, whose states are not closed 
terms anymore, but tuples in the form [Fi,-- - ,14,]) where Fi,---I4 are values. The possible 
actions the environment can perform on a tuple [Fi, • • • , 14] correspond to the choice of an index 
i € {1, ■■■ ,n} and of an action to apply to the value F- If F is a pair, the only possible action is 
to split it into two components. We call this action unfold*. If F is a function, the environment 
can pass it an argument, which can possibly be constructed using other V^’s. More precisely, the 
argument is built by way of an open term C, and a typing context T, such that T h C, and T is 
a subset of {a;^ j j ^ i}: the free variables of T represent the places where other values Vj , with 
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‘5'mui = {[^1) • ■■ ,yn]\Vi,--- ,Vn closed values } 

= {unfold* I z G N} U {@(r,C)* | z G N, (r,C) a (rz, z)-open-value }. 


■^mui([si, • • ■ AN,L),--- ,s„], unfold* )([si,--■s,_i,F,VF,s,+i-- 
■^mui([si,"- ,Xy-N,--- ,s„],@(r,C)*)([s,»,,-- - ,VF, ••• 
with {1, • • • ,n} = z U {ji, jfc} U {hi,. 


,ShJ) = lN{C{s,Jx,,}/ymW) 
■ ■,hm} (disjoint union) and T = 


Figure 7: The Tuple LMC 


j ^ i, are used. Moreover, we ask that for any value kFi, • • • Wn, the term obtained in substituting 
Xj by Wj is a value: it means that C is one of the Xj, or of the form Xy.V. We call a pair (r,C) 
which verifies these conditions a (n, i)-open value. Formally, the LMC = (S'^uU ^mui) 

is defined in Figure [71 

6.1 The Metric 

We are going to define a metric on closed terms which corresponds to linear tests in First, 

we define tuple traces simply as words over The probability to succeed in doing a trace s 

starting from a tuple K G can be naturally defined, and paves the way to defining a metric 
on tuples of values: 

= 1 ; 

a-s) = J2 «)(^) • s); 

H 

H) = sup|Pr”**'(iF, s) - Pr'^'^\H, s)|. 

S 

What we need, however, is a metric on programs. Please remember that states of the LMC 
are tuples of values. Any program M, however, can be viewed as the distribution of values obtained 
by evaluating it, i.e. its semantics |MJ|: 

N) = supl^IMKf^) ■ Pr’^'^\[V ], s) 

S 

- ^IiVl(TT)-Pr“**i([W],s)|. 

The just introduced metric should at least be put in relation to the context metric for it to be 
useful. We know from Section [1] that the context metric coincides with the trace metric. The 
following theorem relates the trace metric 5*’’ and the metric 

Theorem 7 Let I be any finite set of variables, and {Vx}xei {Wx\x^i o-ny two collections of 
values. For any open term C such that I \- C, it holds that: 

<5^-\[Vx\x^l^,[Wx\x^j^) 

Proof. The proof of Theorem [7] is similar to the proof of non-expansiveness for the trace metric : 
first we define a small step semantics, which corresponds to the transition relation in the Markov 
Chain then we define another small step semantics, which corresponds to keep separated 

the context, which is now seen as a term with several holes, and the tuple used to fill it, and 
we end the proof by defining a notion of e-parentality for disributions over pairs of contexts and 
tuple, and showing a stability result for £-parents distributions. These steps are displayed in more 
details below. 
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6.1.1 Trace Semantics Big Steps for Tuples 

We’re going to be interested in the labelled transition system on finite distributions over 
induced by the Markov Chain 

Definition 20 We’ll note the set of finite distributions over We define a reduction 

relation ^ where ^,S’ € A{S^^i), and a e by : 

K 

Now, we define the success probability of a trace for a distribution as : 

Definition 21 If 5 = ai ■ • * Qn, 

s) = Z! ^ ^ ^ 

The relation between this deterministic labelled transition system and the Markov Chain 
can be expressed by the following lemma : 

Lemma 29 Let be K £ S^^i, and s a trace. Then s) = Pr™’’'‘{K,a). 

6.1.2 Trace Semantics Small Steps for Tuples 

We would like now to have a notion of small-step semantics for tuples corresponding to the trace 
semantics of the Markov Chain. Since we are now small steps, we should consider not only tuples 
of values, but tuples of terms as well. Moreover, during the execution, we should remember which 
term of the tuple is being reduced. For this reason, we must add intermediate states, where there 
is explicit focus on terms being evaluated. 

Definition 22 • We define a set 7~F consisting in closed terms of , and distinguished 

values of : TV = {M \ M closed term } U {V \ V closed value }. Then we define the 

corresponding set of tuples S' = {[si, • • • , s„] | si, • • • ,Sn€ TV} 

• TV'^°‘^“® = {M I M closed term }U{focus’'{M) \ M closed term, i S N}U{V | V closed value }, 

and = |[si, • • ■ , Sn] | Si, • • • , s„ G focus integer are all distincts | 

The term which should be reduced first is the term which has the smaller focus index. That’s 
the sense of the following definition. 

Definition 23 For any K G gfocus^ .^^g f{K) defined as: 

• f{K) = oo if K has no element with focus. 

• /([sii ■ • • j Sn]) = i such that Si = focus^{M) and j is the smaller focus in K 

We now define a small step probabilistic labelled reduction relation, where the actions can be 
divided in two kinds : 

Definition 24 We define a labelled reduction relation K ->[] ^ where K G a distribu¬ 
tion over and where a G AcF’’ = {rj U {evaf | i G Nj U {unfold | i G Nj U {@(r,C)* | i G 

N, (r,C) a {n,i)open-value for a n € N}. The rules are the one given in figure\^ 

T is called an internal action, and corresponds to the internal reduction terms under focus in 
the tuple. The other actions are called external actions, and correspond to interactions with the 
environment. The definition given in Figure [8] use the small step semantics for term — 

We want to formalize the probability of doing a trace for a distribution. First we lift the trace 
semantics to a reduction (non probabilistic) to distributions. We’ll note the set of finite 

distributions over 
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/([si, ■ • • ,focus'(M), • • • , s„]) = i M ^ & 

[si, • • • ,f0CUs’(M), • • • , S„] ^[] Y 1 ^{N) • {[si, • • • ,f0CUs’(A^), • • • , Sn] 

/([si, • • • ,focus'(M), • • • , s„]) = i M 74 
[si, • • • ,foCUs'(M), • • • , Sn] ->[] 0 
/([si, • • • , focus'(T/), • • • , s„]) = i 1/ is a value 

[si,--- ,focus'(l/),--- ,s„] 4 [) {[si,-- - ••• ,s„] } 

_ /([Sl,--- , 5 n]) =00 _ 

[si, • • • , Si_l, M, Si+l, • • • , S„] ^j] {[si, • • • , Si_l, foCUS^(M), Si+l, • • • , 

_ /([si,-- - ,an]) =00 _ 

l^si,--- ,Si-i,{M,N),Si+i,--- ,s„j {[si,-" ,Si-1,focus^(M),focus^(Af),Si+l,••• ,s„]^} 

_ /([si, ■ ■ ■ ,Sn]) = 00 _ 

Si,--- ,Si-i,Xy.M,Si+i,--- ,Sj-i,V,Sj+i,--- , s„J {[si,--- , Si-i, focus^(M{y/i/}), Si+i, - - - , Sj-i, s^+i, - - - , s„] } 

/([si, - - - ,Sn]) = 00 {Xj^)i<k<i\-T> { 1 , - - - ,n} = {jfc I 1 < fc < n} U {/ifc I 1 < fe < m} 

si,--- ,Si_i,Ai/.M,Si+i,---,s„ ®^'[f ,iocu 8 ^{M{\z.V{sj^/xj^}^^^^Jy}),--- ,Sh„^ } 


Figure 8: small-step trace semantics for tuples 


K 4[] (I* 

^ + p- {K^} -J>A([]) ^ -l-p • ^ 
iF 4[] S’k ^ in normal form 
^ ^A([]) E/fs.t /(if)=oo 

^ -^A([]) ^ ^A([]) 

^^A([])^ 

Figure 9: small-step trace semantics on distributions of tuples 
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Definition 25 We define a labelled relation S’ =>a([]) , where S,<o € and a G Act^^. 

The rules are the one given in Figure\^ 

Definition 26 = max{X;/(if)=oo I ^ ^A([]) S"'} 

Please observe that for any (external or internal) action a, the relations (between tuples and 
distributions over tuples) K ->[] S^ and K S are deterministic. It’s not the case anymore 

when we lift to relations between distributions, but we have the following lemma : 

Lemma 30 The reduction ■ ^a([]) ' on distributions over is strongly normalizing. 

Proof. It follows from the fact that the relation —>■ over distributions of terms is strongly normal¬ 
izing. □ 

We note S* the normal form of S for the relation ■ —!'A([]) '• By abuse of notation, if s S 
TV^°c“®, we note s* for {s^}*. We can in fact be more precised on the shape of the normal form 
of a distribution : 

Definition 27 Let be s G \Ye define (s*) by : 

• If s = focus\M), then (s*) = 

• otherwise, (s*) = {s^} 

Lemma 31 Let be K = [si,-- - , s„] e Then K* = ' fin]^} 

Proof. The proof is by induction on the maximal number of reduction steps from S to S* (which 
is well defined since • —>'A([]) ' is strongly normalizing) □ 

Now we want to compare the probability to do a trace for the small-step semantics and for the 
big-step semantics. For doing that, we show first the following lemma : 

Lemma 32 Let be a G and S G A(S'(^^;). Then let be S’ the distribution over such 

that S ) S. Let he S’ the distribution over such that : S A[] S A[] • • • Ap S*. 

Then : 

S = S* 

Proof. Let be a € We can see that for every K G there exists an (only one) H G 

such that : K A[] {H^}. It is sufficient to show that : if ^ is the distribution over such that 
K A^( 5 a S, we have that iJ* = S. The proof of that is by case analysis on the rules of Ap, 
and using the characterisation given in Lemma [3T] of the normal form for Ap 

□ 

Now we can extend this result to traces : 

Lemma 33 Let be s a word over .sAnuC be S a distribution over Then : Pr^y^i^S, s) = 

Proof. The proof is by induction on the length of s. 

• if s = e : e) = Y^S. Since ^ is a normal form for • Ap •, we have that : 

Pr-ui(A, e) = E /(ic)=oo ^K) = E A 

• if s = a • t then let be S such that S A^( 5 a S. Then Pr^^fiS,s) = t). We 

apply Lemma EU and we obtain that : S A[] S, and S* = S. Moreover, we have that : 

PrZJ.^,s) = PrZw^-^fi) = PrZA-^fit) 

□ 
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6.1.3 Trace semantics for distribution over contexts and tuples 

Here we consider the same traces used for defining trace semantics for distribution on closed terms. 
We are first going to introduce useful notations : 

Definition 28 We define an operator {(j) |” if) on functions by : If cj) : A ^ N, ip : B ^ N such 
that : 

• AnB = 0 

• Im{4>) C {1, • • • ,n} 

Then ((/> |” ^/>) : H U i? —>• N is defined by : 

• if)(x) = if X € A 

• {(j) I” 'f){x) = n + ifix) if X G B 

We now want to define a set of pairs of context with several holes, and tuples used for filling 
these holes. Formally the idea is the following : We first define things for the untyped case (without 
pairs) : 

Definition 29 • Let be f —>■ N a partial injective function, C an (open) term, and 

[si,--- , s„] an element of . We define the judgment xi,...Xra b (C, </>; [si; • ■' iSn]) 

by : 

— {xi, • • • , Xm} n Dom{(j)) = 0 
~ Xi • * • Xn, iy)yeiDo'm{4>) ^ ^ 

— Im{4>) C {1 • • • n} 

We define the judgment xi, ...Xm ^ {C,4>,[si, Sn]) ■ val by: 

Xi, ...Xm b (C, (j), [ai, * * * , 

— if C = y, then there exists a value V such that : s^(^y) = V 

— if C is not a variable, C is an abstraction (that is, C = Xy.V, where V is an open term), 
or C is a pair (that is C = {'Di,T> 2 ), where Vi and 1)2 are open terms). 

• We define the set of pairs of context and tuples which are well formed : 


A = {{€,4), [si, • • • , s„]) such that 0 b (C, 4>, [si, • • • , s„])} 


• We define a notion of congruence for elements in A : For every permutation 0 : {1,.., n} —>■ 

{l,...,n}, ,s„]) = ([se(i),--- ,se(n)])) 

We should modify the definition if we consider a typed calculus : 

Definition 30 • We define the judgment xi : cti, ...Xm '■ o'm b (C, 4, [Mi, ■ ■ ■ , Mn]) : t by : 

- {xi,--- , Xm} n Dom{4) = 0 

Xi . (Ti • * * Xn . CTn, (y . ^y}y^Dom{(f)) b C . T 

— Im{4>) C {1, • • • ,n}, and b for every y G Dom{4) 

• We define the set of pairs of context and tuples which are well formed : 

A'^ = {{C,4, [Ml,--- ,Mn])\t)L{C,4, [Ml,--- ,Mn]):a} 

• We define a notion of congruence for elements in A : For every permutation a : {1,.., n} -G 
{l,...,n}, {C,4[Mi,--- ,M„]) = (C,cr-i o4, [M,j(j),--- ,M,j(n)\) 
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{(C, (?l), [si, • • • , s„]) } —^ S' 


■ ■ ,tp]) 

\- (C, 0, [si, • • • , s„]) 


Dom((;!!>) n Dom(-!/)) = 0 


& + P- {(CV, {4>\"il)) [si, • • • ,Sn,ii, • • 4 & + p-Y^S{£,v, [mi,- - • .Mq]) - {£V, [mi, - - - , m,, ^l, - - - ^p]) 


{(®, [^1, - - - I ^p])^} 


(C, (?!>, [si, - - - , s„]) : val 
h ip,Ip, ,tp\) 

Dom((;6) n Dom(?/)) = 0 


^+p- {{CV, {(py' Ip) [si, - - - ,Sn,il, - - -^pj) } 4 & +p-Y.<^(.£,V, [mi,-- - .Mqj) - {C£, {(pY' v), [si, - -- ,S„,Ml,- --Mq]) 


_ /([M,si,-- - ,s„]) = oo _ 

Si P-p- {x, ({® ->• 1} [^ (p), [M, Si, - • - , s„]^} P S + p- {x,{{x ^ 1} [^ p), [focus^(M), S 2 , - - - , Sn]) 

M^S /([focus*(M),si,... ,s„]) = i 

S + p ■ {x,{{x ^ 1} Y p), [focus*(M), Sl, - - - , SnY} P S + p ■ {x,{{x ^ 1} p), [focus*((f). Si, - - - , s„]) 

/([focus* (y), Si,-- - ,s„]) = i 

S + P- {x, ({a; 1} Y P), [focus*(F),si, • • • , s„] 4 ^ + p - (a;, ({a; ^ 1} \^p), [f,si,-- - , Sn]) 

_ I- (C,0, [si,--- ,S 9 [) : val V = C{M^(^^)/z}^(=fv(C) _ 

S + p- {xC, (((x l)\^ p)\‘^ p), ^Xy.L, Si ■ ■ ■ Sq,ti ■ ■ ■ ,f„j } -% S + p ■ {x,{{x ^ 1)\^ p), [focus^(L{V"/a;}), fi, - - - ,tn\^} 

\- {C,p, [si,-- - ,s„]) 
h {V,p,[ti,--- ,tp\) : val 
Dom((;6) n Dom(i/)) = 0 

S +p- {{Xx.C)V, ippp), [si, - - - s„,fi, - - - ,fp]^} 4 S +p - {C{V/x}, iprp), [si, - - - s„,fi, - - - ,fp]^} 
p ■ {Aai.C, p, [si, - - • , s„]^} ^ p ■ {{C{V/x}, p, [si, • • • , s„])^} 

P ■ {(®, ((* 1) [^ 0), Sl, - - - , Sn] ) P- {(a;, {{X 1) [^ p), [f0CUS^(Af{y/l/}), Sl, - - - , Sn] 

Si '4" Si Vi, Si value distributiou stopped distribution 

j2^Si+^nj:^Si 

- sPs s^^ 


Figure 10: small-step trace relation on distributions over A (without pairs) 


In the following, we consider equivalence class of =. It corresponds to reorder elements of the 
tuple, and to modify the function p in order to have still the same mapping from the free variables 
of C. 

We define a small-step semantics on elements of A. 

Please observe that the rules would be exactly the same for an strictly linear (that is, not 
affine) calculus. The only thing to change would be the definition of : P h (C, p, K). 

We need a definition of A taking into account possible other free variables We need to add 
rules specific for the language with pairs : 

Please observe that there is two different non-determinism in the rules : the choice of the part 
of the distribution which is going to be reduced, and the way the tuple is divided (for the affine 
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I- (C, 4>, [Ml, • • • , Mn\) : O- 0 r 

x-.a,y.Th{V,i>,[Nu--- {(C, 0, [Mi, • ■ • ,M„])i}4<f 

Dom(0) n (Dom(-!/)) U {x, y}) = 0 

& + P- {(let {x, y)=Cin V, (0 I" !/>) [Ml, • • • , M„, A^i, • • • Np])^} 4 
^ + p-[Li,--- ,L 5 [)-(let (a;,j/) = f in X>, (i/['^ i/;), [Li, • • • , L,, TVi, • • • Afp[) 


{(C,0, [Ml,... ,M„[)i}4<f 


Dom(((>), Dom(i/)), {a;, y}, Dom(!^) 
disjoints sets 


h C, ((), [Ml, .. . , M„[ : cr 
hO,^/;,[iVi,... ,iV^| :r 
® : cr, y : r h (f,[Li,... , 4] : 7 


^ + p.{(let (a;,y) = (C,©) in S,{<pr{^r ■ ■ , M„, Ni, ■ ■ ■ Nm, Li ■ ■ ■ Li])^} 

^ + p.E'^(47,[4i,... ,4,[).(let {x,y) = {T,V) in f, (7 14^ T 4), 


4„iVi,...Ar^,Li,...4j) 


{(©, i/), [4i,... ,4m[)^}4<f C{M,^(,^)/a;} 2 :gDom(« value 


h C, (j), [Ml,... , M„[ : cr 
h4,^/;,[iVi,... ,44 : r 
a; : cr,y : r h (£:,ic, [Li,... , 4 j : 7 
Dom(((>), Dom(i/)), {a;, y}, Dom(ic) disjoints sets 


^ + y.{(let (x,y) = {C,V) in f, (0 [" (i/; j™ ic)) [Mi, 
^ + P • E <^”(4 7 , [4i,... ,Kq\)- {(let {x, y) = (4, V) in f, (<(> ['^ 


.. ,M„,4i,...4 

( 7 1" 4), [Ml, 


m, Ll ■ ■ ■ ii[) } —^ 
,M„,4i,...4„Li,...4[)4 


I- C, (j), [Ml,. .. , M„[ : cr 

C{M,^(^)/2;};,gDom(,A) h 4, i{i, [iVi,... ,4^j :r 

4 { 4 ,/,( 2 )/ 2 :} 2 gDom( 7 ) value a: : cr, y : r h (f,!/, [Ll,... ,Lij : 7 

Doni((()), Doni(i/)), {a;, y}, Dom(ic) disjoints sets 

& + P- {(let {x, y) = {C, V) in L, {<j) [" {xj} ["* v)) [Mi,... , M„, 4i, .. . 4™, Li . .. Lij)^} 4 
&+P- {{S{C/x}{V/y}, (0 [" (i/> r 4), [Ml,... , M„, 4i,... 4^, Ll, ... 4[)4 

M, = {N,L) (C 4 [Ma ••. ,M„[) : 7 

' ' 2 0 Dom(0j 

^ + P- {(let {x,y) = z inC, ({z 1} [^ 4, [Mi,... ,M„j)^} 4 

& + P- {(C, ({a; ^ 1, y ^ 2} [2 4 , [4, L, Ma,... M„[)'} 

_ Ml = (4, L) _ 

p.{(^,({^^l}[V),[Mi,... ,M„|)'}<4.‘' 
p . {(C, ({a; ^ 1 , y ^ 2 } [2 4 , [4, L, Ma,... M„[)'} 

_ Ml = (4, L) _ 

p. {((©,£), 4 [Ml,... ,M„[)4 
p . {(C{4/a;}{£:/y},,(), [Mi,... , M„[)4 


Figure 11: small-step trace relation on distributions over A for pairs 
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case). The second one is not really meaningful, since we have the following lemma : 

Lemma 34 Suppose that: h (C, </>, K), and let be ^, S such that {{C, </>, K)^} A ^ and {{C, (f>, K)^} A 
(o’. Then ^ = S. 

Proof. Let be h {C,(f>,K). 

We are first going to show the following result : Suppose that K = [Mi,-- - - ,Mq], 

et that (j){FV{C)) C {I,-- - ,n}. Then let be ^ such that : {C,(j),K) A Then there 

exist S' such that : \FV{e)A^ij" ' A S, and ^ ^ S{V,tp,[Ni, - - - ,A^p]) • 

{{V, V, [fVi, • • • Np, Mn+i, - - - , with v{x) = ip^x) if a: S FV (C), and ^{x) = p — n + (j){x) 

otherwise. We show that by induction on the derivation of (C, <j), K) A S!. 

Then it is sufficient to remark that, if the free variables of C correspond exactly to the terms in 
the tuple, there is only one possible rule that can be applied. □ 

Definition 31 Let he & a distribution over A. We define F{^) a distribution over closed terms 
by : F(^) = X) ^{C, </), [Mi, • • • , M„]) • {C{M^(^)/x}a:^Dom{ 4 ,)^} 

We would like to know that, if a distribution on terms can do a trace, then the correponding 
distribution where we split contexts and terms filling them can do the same trace. Unfortunately, 
we need to be more precis on how we split the distribution, and especially on what focus we can 
have on the components of the tuple. (For example, {x, {x —)■ 1), [M, focus^(A^)]) Aj since it is 
not possible to evaluate M before having evaluated N.) So we define a notion of coherent tuples 
in for a given context, where the idea is : This context could have triggered the evaluation 

on the terms which are under focus : 

Lemma 35 Let be a distribution over A, and s a trace such that : F{^) A S. Then there 
exists LA such that LA A LA, and F{tA) = S. 

The rules of the trace semantics for elements in A are designed to match the one for trace 
semantics for terms. More precisely, it means that : 

Lemma 36 Let he F (C, [Mi, • • ■ , M„]), and let be LA such that : {{C, [Mi, • • • , M„])^} A 
Then {C{M^^/x}^^fv{C)^} F{^) 

Proof. The proof is by case analysis of the derivation of {(C, [Mi, • • • , M„])^} A □ 

Lemma 37 Let be ^ a distribution over A, and s a trace. Suppose that LA ^ S. Then F{LA) A 
F{S*) 

Proof. It is in fact sufficient to show : 

• If ^ A then there exist LA, such that S A LA, and F(l^) A F{LA). No matter the last 

rule used in the derivation of ^ A (f, it is of the form : = ^ + p - K) A- ^ + p - LAL 

with {{C,4>,K)^} A JtL. Now we have to consider all the possible G S(l^) such 

that Ix^xe:FV{'D) ~ /^^xgfv(C) 

• and : if ^ A (o’, then F{&) A F{S) 

□ 
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6.1.4 Link beetween trace semantics on terms and trace semantics on A. 


Definition 32 Let be Q) and S' two distributions over A. For e > 0, we say that ^ and S are 
e-related if : there exist pi, positive reals, and 'Di,...,'Dd distincts contexts, and ^d, 

distributions on tuples such that : 

3 

^ = Y.Pr 

3 

< e 

Lemma 38 The relation on distributions over A is strongly normalizing. 

Lemma 39 Let he tA, S two e-related distributions. Then A* and S* are related. 

Lemma 40 Let be A, S two e-related distribution. Let be ^* , and 'IS* in normal form such that 
: 3> .iA*, and S . Then and ^ are e-related 

Theorem [7] is deduced of Lemma 00] in a similar way as for the trace distance. 

□ 

Theorem |7| can be read as a non-expansiveness result: if we have a system £, playing the role 
of the environment, and which is prepared to interact with n components, and moreover we have 
two tuples K and H of length n, then the tuple distance between K and H gives us an upper 
bound on the trace distance between the system composed of E interacting with K, and the system 
composed of £ interacting with H. 

We can now see that (5™“' coincides with the context metric: one inequality comes from Theorem 
III the other comes from the fact that any trace s over and designed to start from a single 

value, can be simulated by a context. 

Theorem 8 On programs, (5™“* = 

Proof. • We apply Theorem |7| to Xx.M and Xx.N, which are values, and the context C = [•]: 

N) = 5^'^{Xx.M, Xx.N) 

< 6^'^\[Xx.M] ,Xx.N) 

= N) 


• Let be s a trace in the LMC which starts from a single value. Then we can find a 

context that simulate this trace. 

□ 

6.2 Examples 

The tuple distance, that we have just proved to be fully-abstract, can be seen as yet another 
presentation of the context distance. But there is much more: it allows to evaluate the distance 
between concrete programs, even when the latter contains pairs, in a relatively easy way. In this 
section, we will give two examples. 
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6.2.1 A Simple Example 

Consider the terms M and N defined in Example [3] We can prove that TV) = We 

are first going to show that N) > In order to show that, we are going to present 

a particular trace s such that , s) — , s)| = More precisely, we take 

s = unfold^ -@( 0 ,/)^ -@( 0 ,/)^: it corresponds to first separating the two components of the 
pair, and then passing I as an argument to the first and to the second component. The relevant 
fragment of can be found in Figure [T31 In particular, we can see that Pr{[M ], s) = 1, and 
Pr{[N] ,s) = Now we want to show the reverse inequality, namely that N) < For 


[M] = 

[{Xx.I, Arc./)] 


11 unfold^ 

^ [Xx.I, Xx.I] ^ 






[AT] = 

[<Aa:.(/en),Ax.(/en))] 


1 1 unfold^ 

[[Aa;.(JeQ), Aa:.(J® O))] 


3 ./)" 


[/,Ax.(J® n)f) 


Figure 12: The relevant fragment of the tuple LMC 

that, we are going to use the alternative characterisation of trace distance : it is sufficient to find 
a |-bisimulation R on the LTS of distributions such that ({[M]^}), ({[TV]^}) G R 

6.2.2 A More Complicated Example 

Please remember the example we presented in Section [3J We note the sequence defined 

as: Un = ni<i<n (1 ~ Please observe that the sequence (u„)rtgN has a limit strictly between 
0 and 1 . 

Lemma 41 La suite (un)nGN has a limit I, and i > T > 0 

Proof. • is a decreasing and bounded sequence : it has a limit. 

• I > Ui = ^ 

• We consider the sequence : Vn = log Un = ~ ( 1 ^)- We pose = log 1 — ( 5 )*. 

Then we consider 

, Wn , I log(l-(i)") , . 1 

— 1 rj-Ul ^n—>-00 o 

Wn+1 iog(i _ (i)"+i) 2 

D’Alembert’s theorem for infinite sum implies that the serie is convergent and has a finite 
limit. 

□ 


Theorem 9 For every n gN, A„) = 1 — u„. 

Proof. We first show that A„) > 1 — Un- As in the previous example, we do that by 

finding, for each n S N, a trace Sn such that |Pr([M„], Sn) — Pr{[Nn] , Sn)| = 1 — Un- We define 
the sequence (s„)„gN inductively as follows: 

So = e s„+i = unfold^ ■ @(0, 1)^ ■ Sn 

So is the trace which always succeeds, whatever the starting state is. s„+i corresponds to separating 
the two components of the pair which is in first position in the tuple, then passing the identity 
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Pr([Mo\ , so) = 1 Pr([No] , so) = 1 
Pr{[M„+i] , s„+i) = 1 ■ Pr{[Mn] , Sn) 

Pr{[N„+i] , s„+i) = (1 - ^;^) ■ PrilNn], s„) 


Figure 13: Recursive equations verified by s„ 


as an argument to the first component of this pair, and then executing s„. For this sequence of 
traces, the recursive equations of Figure 0 are verified (the proof can be found in [5]). We can see 
by solving these equations that for every n e N, Pr(M„, s„) = 1 and Pr(A^„, s„) = As a direct 

consequence, we obtain the result. We want now to show that iV„) < 1 — To do that, 

we need to establish that there doesn’t exist a trace t such that |Pr([M„], t)—Pr{[Nn ], t)\ > 1—Un- 
We’re in fact going to show something stronger: for every n G N, we’re going to define a set A„ 
of pairs of tuple, which contains the pair [Nn]), and such that for every {K,H) G A^, for 

every trace t, \Pr{K,t) — Pr{H,t)\ < 1 — Intuitively, the idea behind the sequence {A„}jigN 

is the following: if we start from [Mn], do a trace of even length, and end up in a tuple K with a 
non-zero probability, and if when we do the same trace starting from [A^„] ending up in the tuple 
H, then the pair of tuple {K, H) is in one of the Aj, with j smaller than n. 

Definition 33 Let be n G N. Let A„ be the set of {K, H) such that: there exist m G N, and 

ki > n + 1 (for 1 < i < m), where: 


K = [M„, [Ax.fl]™]; 


H = 




We want now to give an upper bound to the separation between K and H any trace can induce, 
if {K,H)gA„. 

Lemma 42 For every n G N, for every (K, H) G An, we can partition the set of traces as: 

Tr ={s I Pr{K, s) = 0 and Pr{H, s) < 

I Pr{K, s) = 1 and Pr{H, s) > m„}. 


Proof. Let s G Tr. We are going to show by induction on the length of s that for every n G N, 
for every {K,H) G A„, either Pr{K,s) = 0 and Pr{H,s) < or Pr{K,s) = 1 and Pr{H,s) > 

Un ■ 

• If s = e, then for every n G N and {K, H) G An Pr{K, s) = Pr{H, s) = 1, and we are in the 
second case. 

• If the length of s is ^ > 0. Let be n G N, and {K, H) G An- Then we can write: 


K = [Mn, [Ax.fl]"*]; 


H = 


Nn,[Xx.fl®^ I]l<i<m 


with ki > n -|- 1. 


We are now going to distinguish the cases depending on which element of the tuple is applied 
the first action of the trace. 

• If the first action is not applied to the first element of the tuple, then s = @(F, CY ■ t, with 

j > 1: Then Pr{K, s) = 0, and Pr{H, s) < we are in the first case. 

• If the first action is applied to the first element of the tuple : Then we can see that 
s = unfold^ ■ t (since the first element of the tuple is actually a pair, the only action that 
can be applied to it is the unfold action). 
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• First, let’s consider the case where n = 0. Please remember that by definition we have 
that Mq = Nq = {As.Sl, Ax.n). Observe that: 


Ki 

Hi 


[[Ax.O]-+2] ; 

1 

As.O, Acc.O, [Ax.O I]i<i<m 


With these notations, we can see that Pr{K, s) = Pr{Ki,t), and Pr{H, s) = Pr{Hi,t). 
If t = e, these two expressions are equal to 1, and we are in the second case. Otherwise, 
Pr{Ki,t) = 0, and Pr{Hi,t) < and we are in the first case. 

• Now let’s consider the case where n > 1. Please remember that: 


Mn = {Xx.Mn-i,Xx.n); 

Hji = (Ax.(Al^_i 0^ O), Ax.(0 02 ^ .f))* 

Then we’ll note: 


K 2 = [Ax.M„_i, Ax.O, [Ax.O]”’’]; 


H. = 


Ax.(iV„_i 02 " O), Ax.(0 02 " J),[Ax.(O 0 ?F 


With these notations, we can see that Pr{K, s) = Pr{K 2 , t), and Pr{H, s) = Pr{H 2 ,t). 
Now we have to consider the different possible form of the trace t : 

• if t = e, Pr{K, s) = Pr{H, t) = 1 . 

• if t = @(r,C)-^ ■ u, with j > 1, we have Pr{Ki,u) = 0 and Pr{K 2 ,u) < ^ < ^ , 
and we are in the first case. 

• if t = @(r,C)^ ■ u. Please remember the semantics of this action in the Markov 
Chain : If we start from K 2 , with probability 1 we go to a state K 3 of the form : 


K:i 


M„_i, [Ax.O]^ 


with I < m. 
the form : 


If we start from H 2 , with probability (1 — ^) we go in a state H 3 of 




fV„_l, [Ax.(0 ©?F J)]i<i<; 


with ki > n. Now we can see that : 


Pr{K,s) = Pr{K 3 ,u)-, 

Pr{H, s) = (1 - ^) ■ Pr{H 3 ,u). 

Moreover, please observe that G A„_i, so we can apply the induction 

hypothesis (since the length of u is strictly smaller that the length of s). Now, there 
are two possible cases : 

• Pr{K 3 ,u) = 0, and Pr{H 3 ,u) < 5 . Then we can see that the result holds, since 

it implies that : Pr{K, s) = 0 and Pr{H, s) < (1 — • 5 < 5 . 

• Pr{K 3 ,u) = I, and Pr{H 3 ,u) > Un-i- Then we can see that the result holds, 

since it implies that : Pr{K, s) = 1 and Pr{H, s) > (1 — • u„_i = u„. 

□ 


The result we’re seeking to show is a direct consequence of Lemma HU we can see easily that for 
any trace s, if {K,H) G An, the separation that s can induce is smaller than I — Indeed, let 
be s G Tr. Since ([M„], [fV„]) G An, we can see that : 

• Or the trace s is in the first set of the partition given by Lemma SU and |Pr(M„,s) — 
Pr(iV„,t)| < 5 < 1 - Mn. 

• Or the trace s is in the second set of this partition, and then \Pr{Mn, s) — Pr{Nn, t)| < 1 — Un- 

□ 
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6.3 On Tuples and Copying 

The tuple distance naturally suggests a way to handle A-calculi in which copying is indeed allowed. 
Although the details are clearly outside the scope of this paper, we anyway want to give some 
hints about why this is the case. 

What makes the trace and behavioural distances unsound in presence of copying is their inabil¬ 
ity to capture an environment which can access the program at hand more than once. In our view, 
however, the problem does not come from the way those distances are defined in the abstract, but 
rather in the way the underlying LMC reflects the operational semantics of the calculus at hand. 
In a sense, it is in the responsibility of the LMC to guarantee that the environment can access 
terms multiple times. The LMC we introduced in this paper (which is close to the ones from 
the literature IIEIIS]), as an example, is not adequate. 

Suppose, however, to extend to an LMC for a A-calculus in the style of Wadler’s linear 

A-calculus [53]: there, the grammar of terms includes a construct \M whose purpose is marking 
those subterms which can indeed be duplicated. The actions the environment can perform on a 
term in the form \M simply reflects the above: the environment can create a new copy of !M, 
but also keeps the possibility to access \M in the future. One immediately realises that tuples are 
indeed the right way to model the access to both \M and M. 

7 Conclusions 

We have initiated the study of metrics in higher-order languages, starting with the relatively easy 
case of affine A-terms, where copying capabilities are simply not available. We showed that three 
different notions of distance are sound (and sometime fully-abstract) for the context distance, 
the natural generalisation of Morris’ observational equivalence. One of them, the tuple distance, 
reflects the inherently monoidal structure of the underlying calculus, this way allowing to solve 
some nontrivial distance problems. 

We are actively working on extending the results described here to the non-afffne case, which for 
various reasons turns out to be more difficult, as discussed in Section [2] We are in particular quite 
optimistic about the possibility of generalising the tuple distance to a metric reflecting copying. 
The real challenge, however, consists in handling the case in which copying is indeed available, but 
the number of copies of a given term the environment can have access to is somehow bounded, 
maybe polynomially on the value on an security parameter. That would indeed be a way to get 
closer to computational indistinguishability, a central notion in modern cryptography. 
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